Proxy Settings
urllib2 sets the HTTP Proxy by default using the environment variable http_proxy. If you want to explicitly control the Proxy in your program independent of the environment variable, you can do the following
import urllib2
enable_proxy = True
proxy_handler = ({"http" : ':8080'})
null_proxy_handler = ({})
if enable_proxy:
opener = urllib2.build_opener(proxy_handler)
else:
opener = urllib2.build_opener(null_proxy_handler)
urllib2.install_opener(opener)
One detail to note here is that using urllib2.install_opener() sets the global opener for urllib2. This is convenient for later use, but does not allow for more granular control, such as wanting to use two different proxy settings in your program. A better approach is to not use install_opener to change the global settings, but to just call the opener's open method instead of the global urlopen method.
Timeout Settings
In older versions of Python, the urllib2 API did not expose a timeout setting, so the only way to set the timeout value was to change the global timeout value of the socket.
import urllib2
import socket
(10) # Timeout after 10 seconds
(10) # Another way
As of Python 2.6, the timeout can be set directly from the timeout parameter of ().
import urllib2
response = ('', timeout=10)
Adding a specific Header to an HTTP Request
To add the header, you need to use the Request object:
import urllib2
request = (uri)
request.add_header('User-Agent', 'fake-client')
response = (request)
Be careful with some of the headers, the server will check for them.
User-Agent : Some servers or proxies use this value to determine if the request is from a browser or not
Content-Type : When using the REST interface, the server checks this value to determine how the content in the HTTP Body should be parsed. Common values are:
application/xml : used in XML RPC, e.g. RESTful/SOAP calls
application/json : Used in JSON RPC calls.
application/x-www-form-urlencoded : Used when the browser submits a web form.
When using RESTful or SOAP services provided by the server, an incorrect Content-Type setting can cause the server to deny service.
Redirect
By default, urllib2 will automatically redirect the HTTP 3XX return code without manual configuration. To check if a redirect has occurred, just check if the URL of the Response is the same as the URL of the Request.
import urllib2
response = ('')
redirected = () == ''
If you don't want automatic redirect, you can customize the HTTPRedirectHandler class in addition to using the lower-level httplib library.
import urllib2
class RedirectHandler():
def http_error_301(self, req, fp, code, msg, headers):
pass
def http_error_302(self, req, fp, code, msg, headers):
pass
opener = urllib2.build_opener(RedirectHandler)
('')
Cookie
urllib2's handling of cookies is also automatic. If you need to get the value of a cookie item, you can do so:
import urllib2
import cookielib
cookie = ()
opener = urllib2.build_opener((cookie))
response = ('')
for item in cookie:
if == 'some_cookie_item_name':
print
Using HTTP's PUT and DELETE methods
urllib2 only supports the HTTP GET and POST methods, so if you want to use HTTP PUT and DELETE, you have to use the lower-level httplib library. However, we can still make urllib2 able to make a PUT or DELETE request by doing the following:
import urllib2
request = (uri, data=data)
request.get_method = lambda: 'PUT' # or 'DELETE'
response = (request)
This is a Hack approach, but there is nothing wrong with using it in practice.
Get the HTTP return code
For 200 OK, you can get the HTTP return code by using the getcode() method of the response object returned by urlopen. For other return codes, however, urlopen throws an exception. In this case, you need to check the code attribute of the exception object:
import urllib2
try:
response = ('http://')
except , e:
Debug Log
When using urllib2, you can turn on the debug Log in the following way, so that the contents of incoming and outgoing packets will be printed out on the screen, which is convenient for debugging, and sometimes it can save the work of capturing packets
import urllib2
httpHandler = (debuglevel=1)
httpsHandler = (debuglevel=1)
opener = urllib2.build_opener(httpHandler, httpsHandler)
urllib2.install_opener(opener)
response = ('')