Implement the function of batch accessing URLs and parsing XML responses using Python

introduction

In modern web development and data crawling, batch access to URLs and parsing response content is a common requirement. This article will introduce in detail how to implement the following functions using Python:

Bulk access URL: Automatically access multiple URLs through scripts.
Parsing XML responses: Extract the required data from the response.
Save the response content: Save the response content to a file for easier subsequent analysis.

We will start with the basic tool approach, gradually expand to the scenario of batch processing of URLs, and ultimately implement a complete tool script.

1. Background and requirements

Suppose we have a file containing multiple URLs (), the response returned by each URL is an XML format data, as shown below:

&lt;HashMap&gt;
    &lt;code&gt;000000&lt;/code&gt;
    &lt;data&gt;Mr. Ye|18004565345&lt;/data&gt;
    &lt;message&gt;success&lt;/message&gt;
&lt;/HashMap&gt;

Our goals are:

ReadEach URL in the file.
Call the default browser to access the URL.
parse XML response, extractcode、dataandmessageField.
Save the parsed content to a file.

2. Tools and methods implementation

2.1 Single URL access and resolution

First, we implement a tool methodfetch_and_parse_xml, used to access a single URL and parse its XML response.

Code implementation

import requests
import  as ET
import webbrowser

def fetch_and_parse_xml(url, headers=None, output_file=""):
    """
     Tool method: Pass in a URL, open the default browser access, parse the XML response and save it to the file.

     :param url: URL to access
     :param headers: request header (optional)
     :param output_file: XML file path to save parsed results
     :return: parsed XML content (dictionary form)
     """
    # Default request header    default_headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
        'HOST': ''
    }
    
    # If a custom request header is passed, merge    if headers:
        default_headers.update(headers)

    try:
        # Send HTTP GET request        resp = (url, headers=default_headers)
        resp.raise_for_status()  # Check whether the request is successful
        # Call the default browser to open the URL        (url)

        # parse XML response        root = ()
        parsed_data = {
            "code": ("code").text,
            "data": ("data").text,
            "message": ("message").text
        }

        # Save the parsed content to a file        with open(output_file, "w", encoding="utf-8") as file:
            ()  # Save the original XML response            print(f"Response saved to file：{output_file}")

        # Return the parsed content        return parsed_data
    except  as e:
        print(f"askURLAn error occurred while：{e}")
        return None
    except  as e:
        print(f"AnalysisXML响应An error occurred while：{e}")
        return None

Code description

Request URL：
- useSend an HTTP GET request.
- Supports custom request headers.
Call the default browser：
- useOpen the default browser access URL.
Parsing XML responses：
- useParses XML responses.
- extractcode、dataandmessageField.
Save the response content：
- Save the original XML response to the file.
Exception handling：
- Capture exceptions during request and XML parsing and print error messages.

2.2 Sample call

Here is how to call itfetch_and_parse_xmlExample of method:

if __name__ == "__main__":
    url = ":31432/interface/orderPhone?txm=320323134183104&amp;type=1"
    response_data = fetch_and_parse_xml(url, output_file="")
    
    if response_data:
        print("Parsed XML content:")
        print(f"Code: {response_data['code']}")
        print(f"Data: {response_data['data']}")
        print(f"Message: {response_data['message']}")

Sample output

Assume that the XML response returned by the URL is as follows:

&lt;HashMap&gt;
    &lt;code&gt;000000&lt;/code&gt;
    &lt;data&gt;Mr. Ye|180****5345&lt;/data&gt;
    &lt;message&gt;success&lt;/message&gt;
&lt;/HashMap&gt;

Console output:

After parsingXMLcontent：
Code: 000000
Data: Mr. Ye|180****5345
Message: success
Response saved to file：

File content (）：

&lt;HashMap&gt;
    &lt;code&gt;000000&lt;/code&gt;
    &lt;data&gt;Mr. Ye|180****5345&lt;/data&gt;
    &lt;message&gt;success&lt;/message&gt;
&lt;/HashMap&gt;

3. Batch processing URL

Next, we extend the tool method to support batch processing of URL files (）。

3.1 Batch processing scripts

Here is the complete script for batch processing URLs:

import requests
import  as ET
import webbrowser

def fetch_and_parse_xml(url, headers=None, output_file=""):
    """
     Tool method: Pass in a URL, open the default browser access, parse the XML response and save it to the file.

     :param url: URL to access
     :param headers: request header (optional)
     :param output_file: XML file path to save parsed results
     :return: parsed XML content (dictionary form)
     """
    # Default request header    default_headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
        'HOST': ''
    }
    
    # If a custom request header is passed, merge    if headers:
        default_headers.update(headers)

    try:
        # Send HTTP GET request        resp = (url, headers=default_headers)
        resp.raise_for_status()  # Check whether the request is successful
        # Call the default browser to open the URL        (url)

        # parse XML response        root = ()
        parsed_data = {
            "code": ("code").text,
            "data": ("data").text,
            "message": ("message").text
        }

        # Save the parsed content to a file        with open(output_file, "w", encoding="utf-8") as file:
            ()  # Save the original XML response            print(f"Response saved to file：{output_file}")

        # Return the parsed content        return parsed_data
    except  as e:
        print(f"askURLAn error occurred while：{e}")
        return None
    except  as e:
        print(f"AnalysisXML响应An error occurred while：{e}")
        return None

def batch_process_urls(url_file, headers=None):
    """
     Batch each URL in the URL file.

     :param url_file: file path containing URL
     :param headers: request header (optional)
     """
    try:
        with open(url_file, "r", encoding="utf-8") as file:
            urls = ()
    except FileNotFoundError:
        print(f"document {url_file} Does not exist！")
        return

    for i, url in enumerate(urls):
        url = ()  # Remove newlines and spaces        if not url:
            continue

        print(f"Processing the second {i + 1} indivualURL：{url}")
        output_file = f"response_{i + 1}.xml"
        response_data = fetch_and_parse_xml(url, headers=headers, output_file=output_file)
        
        if response_data:
            print(f"Parsed XML content:")
            print(f"Code: {response_data['code']}")
            print(f"Data: {response_data['data']}")
            print(f"Message: {response_data['message']}")
        print("-" * 40)

# Sample callif __name__ == "__main__":
    url_file = ""
    batch_process_urls(url_file)

Sample output

AssumptionsThe file contents are as follows:

:31432/interface/orderPhone?txm=320323134183104&type=1
:31432/interface/orderPhone?txm=320323115958004&type=1

Console output:

Processing the second 1 indivualURL：:31432/interface/orderPhone?txm=320323134183104&amp;type=1
Response saved to file：response_1.xml
After parsingXMLcontent：
Code: 000000
Data: Mr. Ye|180****5345
Message: success
----------------------------------------
Processing the second 2 indivualURL：:31432/interface/orderPhone?txm=320323115958004&amp;type=1
Response saved to file：response_2.xml
After parsingXMLcontent：
Code: 000000
Data: Mr. Li|138****1234
Message: success
----------------------------------------

4. Summary

This article details how to use Python to implement the functionality of batch access to URLs and parsing XML responses. Through toolsfetch_and_parse_xml, we can easily access a single URL and parse its response content. By extending scripts, we also implement the function of batch processing of URL files.

Key points

Request URL:userequestsThe library sends HTTP GET requests.
Call the default browser:useOpen the default browser access URL.
Parsing XML responses:useParses XML responses.
Save the response content: Save the response content to a file.
Batch processing: Batch processing of multiple URLs by reading URL files.

Extended features

Dynamically modify the request header: Supports incoming custom request headers.
Save the parsed content: Save the parsed content as a JSON file.
Asynchronous request:useaiohttpThe library implements concurrent requests.

The above is the detailed content of using Python to implement batch access to URLs and parsing XML response functions. For more information about Python accessing URLs and parsing XML responses, please pay attention to my other related articles!