SoFunction
Updated on 2024-10-29

A one-article guide on how Python can create your own IP pools

development environment (computer)

Python 3.8

Pycharm

Module Usage

requests >>> pip install requests

parsel >>> pip install parsel

If you install python third-party modules

win + R type cmd and click OK, type the install command pip install module name (pip install requests) Enter

In pycharm, click Terminal and enter the install command.

How to configure the python interpreter inside pycharm

select file >>> setting >>> Project >>> python interpreter

Click on the gear, select add

Add the python installation path

How to install plugins in pycharm

Select file >>> setting >>> Plugins

Click Marketplace and enter the name of the plugin you want to install, e.g.: translation plugin, enter translation / Chinese plugin, enter Chinese.

Select the appropriate plug-in and click install.

After the installation is successful, there is an option to restart pycharm. Click OK, and the restart will take effect.

Proxy ip structure

proxies_dict = {
    "http": "http://" + ip:port,
    "https": "http://" + ip:port,
}

reasoning

I. Analysis of data sources

Find out what we want, where we want it from.

II. Code Implementation Steps

Send request, send request for target URL

Get data, get server response data (web page source code)

Parsing the data, extracting the content we want.

Save data, crawl music, video, local csv databases... IP Detection, detect if IP proxy is available, available IP proxy save

  • from
  • import import
  • From what module? What method?
  • from xxx import * # Import all methods

coding

# Import data request module
import requests  # Data request module Third-party module pip install requests
# Import the regular expression module
import re  # Built-in modules
# Import data parsing module
import parsel  # Data parsing module Third-party module pip install parsel >>> This is a core component of the scrapy framework.


lis = []
lis_1 = []

# 1. Send a request, for the destination URL send a request for /free/.
for page in range(11, 21):
    url = f'/free/inha/{page}/'  # Determine the request url address
    """
    headers Request headers that disguise python code.
    """
    # Use the get method of the requests module to send a request to a url address, and then use the response variable to receive the returned data.
    response = (url)
    # <Response [200]> return response object after the request, 200 status code means the request was successful
    # 2. get data, get server response data (web page source code) get response body text data
    # print()
    # 3. Parsing the data, extracting what we want.
    """
    Parsing data methods.
        Regular: You can extract the content of the string data directly.
    Need to get down html string data for conversion
        xpath: according to the label node to extract data content
        css selector: according to the label attributes to extract data content

        Which aspect to use that, that is preferred to use that
    """
    # Regular expressions to extract data content
    """
    # Regular Extract Data () calls a method inside the module
    # ♪ The positive is indecisive. ♪ *? matches any character (except the newline character \n).
    
    ip_list = ('<td data-title="IP">(.*?)</td>', , )
    port_list = ('<td data-title="PORT">(.*?)</td>', , )
    print(ip_list)
    print(port_list)
    """
    # css selector.
    """
    # css selector extract data need to get down html string data() for conversion
    # I don't know css or xpath what to do #
    # #list > table > tbody > tr > td:nth-child(1)
    # //*[@]/table/tbody/tr/td[1]
    selector = () # Convert html string data to selector object
    ip_list = ('#list tbody tr td:nth-child(1)::text').getall()
    port_list = ('#list tbody tr td:nth-child(2)::text').getall()
    print(ip_list)
    print(port_list)
    """
    # xpath extract data
    selector = () # Convert html string data to selector object
    ip_list = ('//*[@]/table/tbody/tr/td[1]/text()').getall()
    port_list = ('//*[@]/table/tbody/tr/td[2]/text()').getall()
    # print(ip_list)
    # print(port_list)
    for ip, port in zip(ip_list, port_list):
        # print(ip, port)
        proxy = ip + ':' + port
        proxies_dict = {
            "http": "http://" + proxy,
            "https": "http://" + proxy,
        }
        # print(proxies_dict)
        (proxies_dict)
        # 4. Detecting IP quality
        try:
            response = (url=url, proxies=proxies_dict, timeout=1)
            if response.status_code == 200:
                print('Current proxy IP: ', proxies_dict,  'Can be used')
                lis_1.append(proxies_dict)
        except:
            print('Current proxy IP: ', proxies_dict,  'Request timed out, test failed')



print('Number of proxy IPs acquired: ', len(lis))
print('Get the number of available IP proxies: ', len(lis_1))
print('Get available IP proxies: ', lis_1)

dit = {
    'http': 'http://110.189.152.86:40698',
    'https': 'http://110.189.152.86:40698'
}

To this article about a text to teach you how to create their own IP pool Python article is introduced to this, more related to the creation of IP pools Python content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future more!