SoFunction
Updated on 2025-03-01

Solution to Python's inability to obtain web page source code using requests

Recently, when crawling a web page, I found that using requests cannot get all the content of the web page, so I used selenium to simulate the browser opening the web page, then get the source code of the web page, and then parsed it through BeautifulSoup to get the example sentences in the web page. In order to keep the loop going, we added refresh() to the loop body, so that when the browser gets a new URL, the web page content is updated by refreshing. Note that in order to better obtain the web page content, the settings will stay for 2 seconds after refreshing, which can reduce the chance of not being able to catch the web page content. In order to reduce the possibility of being blocked, we have also added Chrome, please see the following code:

from selenium import webdriver
from  import Options
from  import Service
from bs4 import BeautifulSoup
import time,re
 
path = Service("D:\\MyDrivers\\")#
# Configure not to display the browserchrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('User-Agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36')
 
# Create a Chrome instance. 
driver = (service=path,options=chrome_options)
lst=["happy","help","evening","great","think","adapt"]
 
for word in lst:
    url="/#result?lang=en&query="+word+"&f=concordance"
    (url)
    # Refresh the web page to get new data    ()
    (2)
    # page_source——》Get the page source code    resp=driver.page_source
    # parse source code    soup=BeautifulSoup(resp,"")
    table = soup.find_all("td")
    with open("",'a+',encoding='utf-8') as f:
        (f"\n{word}Examples\n")
    for i in table[0:6]:
        text=
        #Replace extra spaces        new=("\s+"," ",text)
        #Write txt text        with open("",'a+',encoding='utf-8') as f:
            ((r"^(\d+\.)",r"\n\1",new))
()

1. In order to speed up access, we set the browser not to display, and implement it

2. Recently, the format was cleaned up through re-regex.

3. We set table[0:6] to get the content of the first three sentences, and the final result is as follows.

Happy example
1. This happy mood lasted roughly until last autumn. 
2. The lodging was neither convenient nor happy . 
3. One big happy family "fighting communism". 
Example of help
1. Applying hot moist towels may help relieve discomfort. 
2. The intense light helps reproduce colors more effectively. 
3. My survival route are self help books. 
Evening example
1. The evening feast costs another $10. 
2. My evening hunt was pretty flat overall. 
3. The area nightclubs were active during evenings . 
Example of great
1. The three countries represented here are three great democracies. 
2. Our three different tour guides were great . 
3. Your receptionist "crew" is great ! 
Think example
1. I said yes immediately without thinking everything through. 
2. This book was shocking yet thought provoking. 
3. He thought "disgusting" was more appropriate. 
Example of adapt
1. The novel has been adapted several times. 
2. There are many ways plants can adapt . 
3. They must adapt quickly to changing deadlines. 

Supplement: After the code optimization, the crawling of example sentences is faster, and the code is as follows:

from selenium import webdriver
from  import Options
from  import Service
from bs4 import BeautifulSoup
import time,re
import os
 
# Configure the location of the simulated browserpath = Service("D:\\MyDrivers\\")#
# Configure not to display the browserchrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('User-Agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36')
 
# Create a Chrome instance. 
def get_wordlist():
    wordlist=[]
    with open("",'r',encoding='utf-8') as f:
        lines=()
        for line in lines:
            word=()
            (word)
    return wordlist
 
def main(lst):
    driver = (service=path,options=chrome_options)
    for word in lst:
        url="/#result?lang=en&query="+word+"&f=concordance"
        (url) 
        ()
        (2)
        # page_source——》 page source code        resp=driver.page_source
        # parse source code        soup=BeautifulSoup(resp,"")
        table = soup.find_all("td")
        with open("",'a+',encoding='utf-8') as f:
            (f"\n{word}Examples\n")
        for i in table[0:6]:
            text=
            new=("\s+"," ",text)
            with open("",'a+',encoding='utf-8') as f:
                (new)
#                 (("(\.\s)(\d+\.)","\1\n\2",new))
 
if __name__=="__main__":
    lst=get_wordlist()
    main(lst)
    ("")

Summarize

This is the article about Python’s inability to obtain web source code using requests. For more related requests to obtain web source code, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!