SoFunction
Updated on 2024-12-20

python33 urllib2 usage explained in detail

Proxy Settings

urllib2 sets the HTTP Proxy by default using the environment variable http_proxy. If you want to explicitly control the Proxy in your program independent of the environment variable, you can do the following

Copy Code The code is as follows.

import urllib2

enable_proxy = True
proxy_handler = ({"http" : ':8080'})
null_proxy_handler = ({})

if enable_proxy:
    opener = urllib2.build_opener(proxy_handler)
else:
    opener = urllib2.build_opener(null_proxy_handler)

urllib2.install_opener(opener)

One detail to note here is that using urllib2.install_opener() sets the global opener for urllib2. This is convenient for later use, but does not allow for more granular control, such as wanting to use two different proxy settings in your program. A better approach is to not use install_opener to change the global settings, but to just call the opener's open method instead of the global urlopen method.

Timeout Settings

In older versions of Python, the urllib2 API did not expose a timeout setting, so the only way to set the timeout value was to change the global timeout value of the socket.

Copy Code The code is as follows.

import urllib2
import socket

(10) # Timeout after 10 seconds
(10) # Another way

As of Python 2.6, the timeout can be set directly from the timeout parameter of ().

Copy Code The code is as follows.

import urllib2
response = ('', timeout=10)

Adding a specific Header to an HTTP Request

To add the header, you need to use the Request object:

Copy Code The code is as follows.

import urllib2

request = (uri)
request.add_header('User-Agent', 'fake-client')
response = (request)

Be careful with some of the headers, the server will check for them.

User-Agent : Some servers or proxies use this value to determine if the request is from a browser or not

Content-Type : When using the REST interface, the server checks this value to determine how the content in the HTTP Body should be parsed. Common values are:

application/xml : used in XML RPC, e.g. RESTful/SOAP calls
application/json : Used in JSON RPC calls.
application/x-www-form-urlencoded : Used when the browser submits a web form.
When using RESTful or SOAP services provided by the server, an incorrect Content-Type setting can cause the server to deny service.

Redirect

By default, urllib2 will automatically redirect the HTTP 3XX return code without manual configuration. To check if a redirect has occurred, just check if the URL of the Response is the same as the URL of the Request.

Copy Code The code is as follows.

import urllib2
response = ('')
redirected = () == ''

If you don't want automatic redirect, you can customize the HTTPRedirectHandler class in addition to using the lower-level httplib library.

Copy Code The code is as follows.

import urllib2

class RedirectHandler():
    def http_error_301(self, req, fp, code, msg, headers):
        pass
    def http_error_302(self, req, fp, code, msg, headers):
        pass

opener = urllib2.build_opener(RedirectHandler)
('')

Cookie

urllib2's handling of cookies is also automatic. If you need to get the value of a cookie item, you can do so:

Copy Code The code is as follows.

import urllib2
import cookielib

cookie = ()
opener = urllib2.build_opener((cookie))
response = ('')
for item in cookie:
    if == 'some_cookie_item_name':
        print

Using HTTP's PUT and DELETE methods

urllib2 only supports the HTTP GET and POST methods, so if you want to use HTTP PUT and DELETE, you have to use the lower-level httplib library. However, we can still make urllib2 able to make a PUT or DELETE request by doing the following:

Copy Code The code is as follows.

import urllib2

request = (uri, data=data)
request.get_method = lambda: 'PUT' # or 'DELETE'
response = (request)

This is a Hack approach, but there is nothing wrong with using it in practice.

Get the HTTP return code

For 200 OK, you can get the HTTP return code by using the getcode() method of the response object returned by urlopen. For other return codes, however, urlopen throws an exception. In this case, you need to check the code attribute of the exception object:

Copy Code The code is as follows.

import urllib2
try:
    response = ('http://')
except , e:
    print
Debug Log

When using urllib2, you can turn on the debug Log in the following way, so that the contents of incoming and outgoing packets will be printed out on the screen, which is convenient for debugging, and sometimes it can save the work of capturing packets

Copy Code The code is as follows.

import urllib2

httpHandler = (debuglevel=1)
httpsHandler = (debuglevel=1)
opener = urllib2.build_opener(httpHandler, httpsHandler)

urllib2.install_opener(opener)
response = ('')