Steps to obtain web page information using Selenium in Python

1. Why use Selenium to obtain page information

In web automated testing and data crawling, obtaining page information is a basic and important operation. With Selenium, you can easily get various information about the page, such as title, URL, source code, element text and attributes. This information can be used not only to verify test results, but also to data analysis and processing.

2. Selenium Basic Settings

Before you begin, make sure you have the Selenium library and the corresponding WebDriver installed (such as ChromeDriver or GeckoDriver). Here are the basic settings:

from selenium import webdriver

# Create a WebDriver instancedriver = ()

# Open the landing page("")

3. Get the page title

Page titles are often used to verify that the page is loading correctly.

title = 
print(f"Page title: {title}")

4. Get the current URL

Get the URL of the current page, which can be used to verify that the redirect is correct, etc.

current_url = driver.current_url
print(f"current URL: {current_url}")

5. Get the page source code

Gets the complete HTML source code of the page that can be used to analyze the page structure.

page_source = driver.page_source
print(f"Page source code: {page_source}")

6. Get the text of the element

Getting the text content of a specific element in a page is one of the most common actions.

element = driver.find_element_by_id("element_id")
element_text = 
print(f"Element text: {element_text}")

7. Get the attributes of the element

Get the attributes of the element, such ashreforsrc, very useful for extracting information such as links and pictures.

element = driver.find_element_by_id("element_id")
attribute_value = element.get_attribute("attribute_name")
print(f"Element attribute value: {attribute_value}")

8. Get Cookies

Gets all cookies on the current page, which can be used for session management and verification operations.

cookies = driver.get_cookies()
print(f"all Cookies: {cookies}")

# Get specific cookiescookie = driver.get_cookie("cookie_name")
print(f"specific Cookie: {cookie}")

9. Screenshot

Screenshots of the current page can be used for report generation and debugging.

driver.save_screenshot("")
print("Screenshot saved")

10. Sample code

Here is a comprehensive example showing how to obtain different types of page information:

from selenium import webdriver

driver = ()
("")

# Get the page titletitle = 
print(f"Page title: {title}")

# Get the current URLcurrent_url = driver.current_url
print(f"current URL: {current_url}")

# Get the page source codepage_source = driver.page_source
print(f"Page source code: {page_source}")

# Get the text of the elementelement = driver.find_element_by_id("element_id")
element_text = 
print(f"Element text: {element_text}")

# Get the attributes of the elementattribute_value = element.get_attribute("attribute_name")
print(f"Element attribute value: {attribute_value}")

# Get all cookiescookies = driver.get_cookies()
print(f"all Cookies: {cookies}")

# Get specific cookiescookie = driver.get_cookie("cookie_name")
print(f"specific Cookie: {cookie}")

# Take screenshots of the pagedriver.save_screenshot("")
print("Screenshot saved")

()

11. Summary

With Selenium, it becomes very simple and efficient to obtain web page information. Whether it is the page title, URL, source code, or the text and attributes of the element, Selenium can easily handle it. I hope this blog can help you better understand and apply Selenium to achieve efficient page information extraction in actual projects.

The above is the detailed process of Python using Selenium to obtain web page information. For more information about Python Selenium to obtain web page information, please follow my other related articles!