Python implementation locates elements containing specific text information

In Python programming, especially when performing web data scraping or automated testing, locate elements containing specific text information is a common and important task. Whether using Selenium for web page automation or parsing HTML documents through BeautifulSoup, mastering how to locate elements containing specific text can greatly improve our work efficiency. This article will help everyone understand and master this skill through concise language, practical code examples and vivid cases.

1. Why do we need to locate elements containing text information?

During web page automation or data crawling, we often need to interact with specific elements in the landing page, such as clicking a button, filling in a form, or extracting a piece of text. And these elements are often identified by the text content they display. Therefore, positioning elements containing specific text information becomes a basic and critical operation.

2. Use Selenium to locate elements containing text

Selenium is a tool for automated testing of web applications. It can interact directly with the browser and simulate various actions of users. In Selenium, we can locate elements that contain specific text in a number of ways.

1. Use find_element_by_link_text

When you need to locate a link (<a> tag), if the text of the link happens to be what we need to locate, you can use the find_element_by_link_text method.

Code example:

from selenium import webdriver
 
# Initialize the browserdriver = ()
 
# Open the web page('')
 
# Locate links containing specific textlink = driver.find_element_by_link_text('Click here')
 
# Click on the link()
 
# Close the browser()

Case: Suppose we are writing an automated script to automatically click on the link "Add to Cart" on an e-commerce website. If the text of the Add to Cart link on each item page is the same, then we can use the above method to locate and click it.

2. Use find_element_by_partial_link_text

When the link text is long and we only need to match a part of it, we can use the find_element_by_partial_link_text method.

Code example:

# Locate links containing some textlink = driver.find_element_by_partial_link_text('Shopping cart')

Case: On some websites, the text of the Add to Cart link may contain additional information such as Add to Cart Now. In this case, we can locate it using partial text "Carl".

3. Use XPath to locate elements containing text

XPath is a language for finding information in XML documents, and it also works for HTML documents. With XPath, we can write complex queries to locate elements containing specific text.

Code example:

# Position buttons containing specific text (assuming the text of the button is "submit")button = driver.find_element_by_xpath("//button[contains(text(), 'Commit')]") 
# Click the button()

Case: On a form submission page, we need to locate and click a button with text "Submit". Since the button may be nested in multiple levels, using XPath allows for more flexibility in positioning it.

4. Use CSS selector to locate elements containing text

The CSS selector is a pattern for selecting HTML elements, which can also be used to locate elements in Selenium. Although the CSS selector itself does not directly support text matching, we can combine other properties (such as class, id, etc.) and pseudo-classes (such as: contains, but note that Selenium native does not support: contains, which is only used as a concept introduction here) to achieve similar effects. However, in practice, we usually combine JavaScript to achieve this.

Code example (in combination with JavaScript):

# Use JavaScript to execute selector query containing textelement = driver.execute_script("""
    return ('body:contains("Specific text")');
""")
 
# Note: The above code will not work directly in Selenium because Selenium does not support the `:contains` pseudo-class.# Here is a practical alternative that is done by iterating through all elements and checking their text content: 
def find_element_by_text(driver, text):
    elements = driver.find_elements_by_tag_name('*')  # Get all elements    for element in elements:
        if text in :  # Check whether the text content of the element contains the specified text            return element
    return None
 
# Use custom functions to locate elements containing specific textelement = find_element_by_text(driver, 'Specific Text')

Case: In a complex web page, we need to locate any element (probably paragraph, title, or other) that contains a specific text (such as "Contact Us"). Since the CSS selector itself does not support matching text directly, we wrote a custom function to iterate through all elements and check their text content.

3. Use BeautifulSoup to locate elements containing text

BeautifulSoup is a Python library for parsing HTML and XML documents. It creates a parse tree from which data can be extracted. Unlike Selenium, BeautifulSoup is mainly used to parse static HTML documents rather than interacting with the browser.

1. Use find and find_all methods

BeautifulSoup provides find and find_all methods to locate elements containing specific text. Both methods accept a string parameter to match the text content of the element.

Code example:

from bs4 import BeautifulSoup
 
# Suppose we have a string representation of HTML documenthtml_doc = """
&lt;html&gt;
&lt;head&gt;&lt;title&gt;Sample page&lt;/title&gt;&lt;/head&gt;
&lt;body&gt;
&lt;p&gt;This is a paragraph。&lt;/p&gt;
&lt;p&gt;This is another paragraph containing a specific text。&lt;/p&gt;
&lt;/body&gt;
&lt;/html&gt;
# parse HTML documentssoup = BeautifulSoup(html_doc, '')
 
# Position elements containing specific textelement = ('p', string='This is another paragraph containing a specific text.  ')
 
# Print the text content of the elementprint()

Case: In an HTML document, we need to locate a paragraph element that contains a specific text (such as "This is another paragraph that contains a specific text."). Using BeautifulSoup's find method, we can easily find this element.

2. Use regular expressions to match text

If we need to match more complex text patterns, we can use regular expressions. The find and find_all methods of BeautifulSoup accept a re parameter to specify a regular expression object.

Code example:

import re
 
# Compile a regular expression object to match text containing the word "specific"pattern = (r'.*specific.*')
 
# Position elements containing specific text patternselements = soup.find_all('p', string=pattern)
 
# Print text content of all matching elementsfor element in elements:
    print()

Case: In an HTML document, we need to locate all paragraph elements that contain the word "specific". Using regular expressions and BeautifulSoup's find_all method, we can find all matching elements.

4. Summary

In Python programming, locating elements that contain specific text information is a basic and important skill. Whether it is using Selenium for web page automation or parsing HTML documents through BeautifulSoup, we can locate these elements in a variety of ways. Mastering these skills will greatly improve our productivity and programming skills.

This is the end of this article about Python implementing positioning elements containing specific text information. For more related contents of Python positioning elements, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!