Reliable way to get web content in Python

Method 1: Use the httpx library

httpxyesrequestsAn alternative library that supports asynchronous requests and also has better SSL verification functions.httpxSSL certificate verification will be performed by default, but it can be configured when needed.

Installhttpx：

pip install httpx

Example of usage:

import httpx
 
url = '/hnsnyt/xxgk/gfxwj/index_1.html'
 
# Create a client objectwith () as client:
    response = (url)
    print()

If you encounter an SSL error, you can adjust itverifyParameters to disable SSL verification, although this is still not recommended for production environments:

response = (url, verify=False)

Method 2: Use the urllib3 library

urllib3It is a high-level HTTP client in Python, which handles SSL in a way that is more important thanrequestsMore underlying, suitable for scenarios where SSL configuration needs to be carefully controlled. You can passurllib3To download the web page and manage SSL settings.

Installurllib3：

pip install urllib3

Example of usage:

import urllib3
 
# Create a PoolManager to support finer granular controlhttp = ()
 
url = '/hnsnyt/xxgk/gfxwj/index_1.html'
 
# Disable SSL verification (if you don't care about security)response = ('GET', url, retries=3, timeout=5.0)
print(('utf-8'))

If you want to perform a more granular SSL configuration, you can set up certificate verification directly using SSLContext.

Method 3: Use aiohttp (asynchronous request)

If you need to make multiple asynchronous HTTP requests, aiohttp is a very powerful asynchronous HTTP client library that supports coroutines that allow you to make network requests in a more efficient way. It also has a better SSL processing mechanism.

Install aiohttp:

pip install aiohttp

Example of usage:

import aiohttp
import asyncio
 
async def fetch(url):
    async with () as session:
        async with (url) as response:
            return await ()
 
url = '/hnsnyt/xxgk/gfxwj/index_1.html'
 
# Run asynchronouslyhtml = (fetch(url))
print(html)

If you encounter SSL problems, you can useverify_sslParameters to disable SSL verification:

async with (url, ssl=False) as response:

Method 4: Use the pycurl library (more underlying, more configurations are supported)

pycurlYes Python is rightlibcurlThe packaging provides more configuration options, especially suitable for fine control of SSL certificates and protocols.

Installpycurl：

pip install pycurl

Example of usage:

import pycurl
from io import BytesIO
 
url = '/hnsnyt/xxgk/gfxwj/index_1.html'
 
# Create a buffer to receive responsesbuffer = BytesIO()
 
# Create a cURL objectc = ()
(, url)
(, buffer)
 
# Disable SSL verification(c.SSL_VERIFYPEER, 0)  # Do not verify the other party's certificate(c.SSL_VERIFYHOST, 0)  # Do not verify the host 
# Execute the request()
 
# Get resultsresponse = ().decode('utf-8')
print(response)

Method 5: Use certifi to customize the certificate path

If the SSL error is due to missing root certificate, you can passcertifito ensure the latest certificate chain is used.certifiA collection of Mozilla's certificates is provided to help you avoid SSL errors.

Installcertifi：

pip install certifi

You can then explicitly specify the certificate path in the request, ensuring that the latest root certificate is used.

import requests
import certifi
 
url = '/hnsnyt/xxgk/gfxwj/index_1.html'
response = (url, verify=())  # Use certificate pathprint()

Summarize:

httpx: RecommendedhttpxReplace the libraryrequests, it has stronger SSL processing and higher flexibility.
urllib3: Provides more underlying controls, suitable for fine SSL configurations.
aiohttp: Asynchronous request, suitable for concurrent download operations.
pycurl: If you need full control over HTTP requests and SSL configuration,pycurlIt is a very powerful choice.
certifi: Ensure that SSL certificate verification uses the latest certificate set.

These solutions can handle SSL issues while ensuring security. If the main problem you are facing is SSL certificate issues, make sure to use a newer certificate chain and avoid disabling SSL verification in production.

The above is the detailed content of Python’s reliable method to obtain web content. For more information about Python’s acquisition of web content, please pay attention to my other related articles!