introduction
In modern web development and data crawling, batch access to URLs and parsing response content is a common requirement. This article will introduce in detail how to implement the following functions using Python:
- Bulk access URL: Automatically access multiple URLs through scripts.
- Parsing XML responses: Extract the required data from the response.
- Save the response content: Save the response content to a file for easier subsequent analysis.
We will start with the basic tool approach, gradually expand to the scenario of batch processing of URLs, and ultimately implement a complete tool script.
1. Background and requirements
Suppose we have a file containing multiple URLs (), the response returned by each URL is an XML format data, as shown below:
<HashMap> <code>000000</code> <data>Mr. Ye|18004565345</data> <message>success</message> </HashMap>
Our goals are:
- Read
Each URL in the file.
- Call the default browser to access the URL.
- parse XML response, extract
code
、data
andmessage
Field. - Save the parsed content to a file.
2. Tools and methods implementation
2.1 Single URL access and resolution
First, we implement a tool methodfetch_and_parse_xml
, used to access a single URL and parse its XML response.
Code implementation
import requests import as ET import webbrowser def fetch_and_parse_xml(url, headers=None, output_file=""): """ Tool method: Pass in a URL, open the default browser access, parse the XML response and save it to the file. :param url: URL to access :param headers: request header (optional) :param output_file: XML file path to save parsed results :return: parsed XML content (dictionary form) """ # Default request header default_headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36', 'HOST': '' } # If a custom request header is passed, merge if headers: default_headers.update(headers) try: # Send HTTP GET request resp = (url, headers=default_headers) resp.raise_for_status() # Check whether the request is successful # Call the default browser to open the URL (url) # parse XML response root = () parsed_data = { "code": ("code").text, "data": ("data").text, "message": ("message").text } # Save the parsed content to a file with open(output_file, "w", encoding="utf-8") as file: () # Save the original XML response print(f"Response saved to file:{output_file}") # Return the parsed content return parsed_data except as e: print(f"askURLAn error occurred while:{e}") return None except as e: print(f"AnalysisXML响应An error occurred while:{e}") return None
Code description
-
Request URL:
- use
Send an HTTP GET request.
- Supports custom request headers.
- use
-
Call the default browser:
- use
Open the default browser access URL.
- use
-
Parsing XML responses:
- use
Parses XML responses.
- extract
code
、data
andmessage
Field.
- use
-
Save the response content:
- Save the original XML response to the file.
-
Exception handling:
- Capture exceptions during request and XML parsing and print error messages.
2.2 Sample call
Here is how to call itfetch_and_parse_xml
Example of method:
if __name__ == "__main__": url = ":31432/interface/orderPhone?txm=320323134183104&type=1" response_data = fetch_and_parse_xml(url, output_file="") if response_data: print("Parsed XML content:") print(f"Code: {response_data['code']}") print(f"Data: {response_data['data']}") print(f"Message: {response_data['message']}")
Sample output
Assume that the XML response returned by the URL is as follows:
<HashMap> <code>000000</code> <data>Mr. Ye|180****5345</data> <message>success</message> </HashMap>
Console output:
After parsingXMLcontent: Code: 000000 Data: Mr. Ye|180****5345 Message: success Response saved to file:
File content ():
<HashMap> <code>000000</code> <data>Mr. Ye|180****5345</data> <message>success</message> </HashMap>
3. Batch processing URL
Next, we extend the tool method to support batch processing of URL files ()。
3.1 Batch processing scripts
Here is the complete script for batch processing URLs:
import requests import as ET import webbrowser def fetch_and_parse_xml(url, headers=None, output_file=""): """ Tool method: Pass in a URL, open the default browser access, parse the XML response and save it to the file. :param url: URL to access :param headers: request header (optional) :param output_file: XML file path to save parsed results :return: parsed XML content (dictionary form) """ # Default request header default_headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36', 'HOST': '' } # If a custom request header is passed, merge if headers: default_headers.update(headers) try: # Send HTTP GET request resp = (url, headers=default_headers) resp.raise_for_status() # Check whether the request is successful # Call the default browser to open the URL (url) # parse XML response root = () parsed_data = { "code": ("code").text, "data": ("data").text, "message": ("message").text } # Save the parsed content to a file with open(output_file, "w", encoding="utf-8") as file: () # Save the original XML response print(f"Response saved to file:{output_file}") # Return the parsed content return parsed_data except as e: print(f"askURLAn error occurred while:{e}") return None except as e: print(f"AnalysisXML响应An error occurred while:{e}") return None def batch_process_urls(url_file, headers=None): """ Batch each URL in the URL file. :param url_file: file path containing URL :param headers: request header (optional) """ try: with open(url_file, "r", encoding="utf-8") as file: urls = () except FileNotFoundError: print(f"document {url_file} Does not exist!") return for i, url in enumerate(urls): url = () # Remove newlines and spaces if not url: continue print(f"Processing the second {i + 1} indivualURL:{url}") output_file = f"response_{i + 1}.xml" response_data = fetch_and_parse_xml(url, headers=headers, output_file=output_file) if response_data: print(f"Parsed XML content:") print(f"Code: {response_data['code']}") print(f"Data: {response_data['data']}") print(f"Message: {response_data['message']}") print("-" * 40) # Sample callif __name__ == "__main__": url_file = "" batch_process_urls(url_file)
Sample output
AssumptionsThe file contents are as follows:
:31432/interface/orderPhone?txm=320323134183104&type=1 :31432/interface/orderPhone?txm=320323115958004&type=1
Console output:
Processing the second 1 indivualURL::31432/interface/orderPhone?txm=320323134183104&type=1 Response saved to file:response_1.xml After parsingXMLcontent: Code: 000000 Data: Mr. Ye|180****5345 Message: success ---------------------------------------- Processing the second 2 indivualURL::31432/interface/orderPhone?txm=320323115958004&type=1 Response saved to file:response_2.xml After parsingXMLcontent: Code: 000000 Data: Mr. Li|138****1234 Message: success ----------------------------------------
4. Summary
This article details how to use Python to implement the functionality of batch access to URLs and parsing XML responses. Through toolsfetch_and_parse_xml
, we can easily access a single URL and parse its response content. By extending scripts, we also implement the function of batch processing of URL files.
Key points
-
Request URL:use
requests
The library sends HTTP GET requests. -
Call the default browser:use
Open the default browser access URL.
-
Parsing XML responses:use
Parses XML responses.
- Save the response content: Save the response content to a file.
- Batch processing: Batch processing of multiple URLs by reading URL files.
Extended features
- Dynamically modify the request header: Supports incoming custom request headers.
- Save the parsed content: Save the parsed content as a JSON file.
-
Asynchronous request:use
aiohttp
The library implements concurrent requests.
The above is the detailed content of using Python to implement batch access to URLs and parsing XML response functions. For more information about Python accessing URLs and parsing XML responses, please pay attention to my other related articles!