1. Why use Selenium to obtain page information
In web automated testing and data crawling, obtaining page information is a basic and important operation. With Selenium, you can easily get various information about the page, such as title, URL, source code, element text and attributes. This information can be used not only to verify test results, but also to data analysis and processing.
2. Selenium Basic Settings
Before you begin, make sure you have the Selenium library and the corresponding WebDriver installed (such as ChromeDriver or GeckoDriver). Here are the basic settings:
from selenium import webdriver # Create a WebDriver instancedriver = () # Open the landing page("")
3. Get the page title
Page titles are often used to verify that the page is loading correctly.
title = print(f"Page title: {title}")
4. Get the current URL
Get the URL of the current page, which can be used to verify that the redirect is correct, etc.
current_url = driver.current_url print(f"current URL: {current_url}")
5. Get the page source code
Gets the complete HTML source code of the page that can be used to analyze the page structure.
page_source = driver.page_source print(f"Page source code: {page_source}")
6. Get the text of the element
Getting the text content of a specific element in a page is one of the most common actions.
element = driver.find_element_by_id("element_id") element_text = print(f"Element text: {element_text}")
7. Get the attributes of the element
Get the attributes of the element, such ashref
orsrc
, very useful for extracting information such as links and pictures.
element = driver.find_element_by_id("element_id") attribute_value = element.get_attribute("attribute_name") print(f"Element attribute value: {attribute_value}")
8. Get Cookies
Gets all cookies on the current page, which can be used for session management and verification operations.
cookies = driver.get_cookies() print(f"all Cookies: {cookies}") # Get specific cookiescookie = driver.get_cookie("cookie_name") print(f"specific Cookie: {cookie}")
9. Screenshot
Screenshots of the current page can be used for report generation and debugging.
driver.save_screenshot("") print("Screenshot saved")
10. Sample code
Here is a comprehensive example showing how to obtain different types of page information:
from selenium import webdriver driver = () ("") # Get the page titletitle = print(f"Page title: {title}") # Get the current URLcurrent_url = driver.current_url print(f"current URL: {current_url}") # Get the page source codepage_source = driver.page_source print(f"Page source code: {page_source}") # Get the text of the elementelement = driver.find_element_by_id("element_id") element_text = print(f"Element text: {element_text}") # Get the attributes of the elementattribute_value = element.get_attribute("attribute_name") print(f"Element attribute value: {attribute_value}") # Get all cookiescookies = driver.get_cookies() print(f"all Cookies: {cookies}") # Get specific cookiescookie = driver.get_cookie("cookie_name") print(f"specific Cookie: {cookie}") # Take screenshots of the pagedriver.save_screenshot("") print("Screenshot saved") ()
11. Summary
With Selenium, it becomes very simple and efficient to obtain web page information. Whether it is the page title, URL, source code, or the text and attributes of the element, Selenium can easily handle it. I hope this blog can help you better understand and apply Selenium to achieve efficient page information extraction in actual projects.
The above is the detailed process of Python using Selenium to obtain web page information. For more information about Python Selenium to obtain web page information, please follow my other related articles!