Use Python to implement automatic replacement or modification of PDF text

introduction

When processing PDF documents, we sometimes encounter situations where we need to update the text content in the document. For example, if the company has released new policies or product information, it needs to modify the relevant content in the PDF manual or promotional document; or it may be important documents such as financial statements and contract agreements, and data and details need to be updated regularly as business changes. Opening PDF files manually and finding and modifying text content one by one is a tedious and error-prone task. For PDF documents that require frequent updates or involve a large number of text modifications, programming automation of text replacement is undoubtedly the best choice. This article will introduce how to use Python to implement automatic replacement of PDF text.

Usage Tools

To implement PDF text modification or replacement in Python applications, you can use for Python. It is a library dedicated to creating, reading, manipulating, and converting PDF documents in Python applications.

You can run the following command from the terminalPyPIInstall for Python:

pip install

Python replaces all instances of specific text in PDF

You can use()Method to replace all instances of specific text in a PDF page. The specific steps are as follows:

createPdfDocumentAn instance of the class.
use()Method loads PDF document.
Loop through pages in PDF documents. For each page:
- createPdfTextReplacerAn instance of the class and pass the current page object as a parameter into the constructor of the class.
- use()Method replaces all instances of specific text on the page with new text.
use() Method saves the result document.

Implementation code:

from  import *
from  import *
 
def replace_text_in_page(page, old_text, new_text, color=None):
    """
    Replace all instances of specific text on a specific page
    parameter:
    page (PdfPageBase): Pages to replace text
    old_text (str): Original text to replace
    new_text (str): New text for replacement
    color (Color, Optional): If you need to change the text color，则提供该parameter；Otherwise leave empty
    """
    replacer = PdfTextReplacer(page)
    if color:
        (old_text, new_text, color)
    else:
        (old_text, new_text)
 
# Create an object of the PdfDocument classdoc = PdfDocument()
# Load PDF files("Moonlight in the lotus pond.pdf")
 
# traverse every page in the documentfor i in range():
    # Get the current page    page = [i]
 
    # Replace all instances of specific text in the current page with new text    replace_text_in_page(page, "Lotus Pond", "pond")
 
    # If you need to replace the text and change the text color, use the following code    # replace_text_in_page(page, "lotus pond", "pond", Color.get_Red()) 
# Save the modified PDF file("Replace all instances.pdf")
# Close the document to free up resources()

Python replaces the first instance of a specific text in a PDF

If a text appears multiple times in the PDF and you only want to replace the first text, you can use()method. The specific steps are as follows:

createPdfDocumentAn instance of the class.
use()Method loads PDF document.
Loop through pages in PDF documents. For each page:
- createPdfTextReplacerAn instance of the class and pass the current page object as a parameter into the constructor of the class.
- use()Method replaces the first instance of a specific text on the page with a new text.
use()Method saves the result document.

Implementation code:

from  import *
from  import *
 
def replace_text_in_page(page, old_text, new_text):
    """
    Replace the first instance of a specific text on a specific page
    parameter:
    page (PdfPageBase): Pages to replace text
    old_text (str): Original text to replace
    new_text (str): New text for replacement
    """
    replacer = PdfTextReplacer(page)
    (old_text, new_text)
 
# Create an object of the PdfDocument classdoc = PdfDocument()
# Load PDF files("Moonlight in the lotus pond.pdf")
 
# traverse every page in the documentfor i in range():
    # Get the current page    page = [i]    
    # Replace the first instance of a specific text in the current page with a new text    replace_text_in_page(page, "Lotus Pond", "pond")
 
# Save the modified PDF file("Replace the first instance.pdf")
# Close the document to free up resources()

Python replaces specific literals with regular expressions in PDF

For Python providesProperties, used to set text replacement mode. By setting this property to, you can set the current text replacement mode to regular expression replacement mode. The specific steps are as follows:

createPdfDocumentAn instance of the class.
use()Method loads PDF document.
Loop through pages in PDF documents. For each page:
- createPdfTextReplacerAn instance of the class and pass the current page object as a parameter into the constructor of the class.
- WillThe property is set toChange the current text replacement mode to the regular expression replacement mode.
- Pass regular expressions and new text as parameters()Method to replace text matched by regular expressions on the page with new text.
use()Method saves the result document.

Implementation code:

from  import *
from  import *
 
def replace_text_with_regex(page, regex, new_text):
    """
    Replace matching text in a page with regular expressions
    parameter:
    page (PdfPageBase): Pages to replace text
    regex (str): Regular expressions，Used to match text that needs to be replaced
    new_text (str): New text for replacement
    """
    replacer = PdfTextReplacer(page)
     = 
    (regex, new_text)
 
# Create an object of the PdfDocument classdoc = PdfDocument()
# Load PDF files("Template.pdf")
 
# traverse every page in the documentfor i in range():
    # Get the current page    page = [i]
    # Use regular expressions to replace matching text in the current page    replace_text_with_regex(page, r"\#\w+\b", "monitor") 
# Save the modified PDF file("regular expression replacement.pdf")
# Close the document to free up resources()

Other replacement condition settings

For Python also supports setting other replacement conditions, such asCase insensitiveandFull word matching. Just need toJust set the attribute to the corresponding value.

Implementation code:

from  import *
from  import *
 
def replace_text_with_options(page: PdfPageBase, old_text: str, new_text: str, ignore_case: bool = False, whole_word: bool = False):
    """
    Replace text in the page with specified conditions
    parameter:
    page (PdfPageBase): Pages to replace text
    old_text (str): Original text to replace
    new_text (str): New text for replacement
    ignore_case (bool): Whether to ignore case。The default value is False
    whole_word (bool): Whether the full word matches。The default value is False
    """
    replacer = PdfTextReplacer(page)
 
    # Set text replacement mode according to options    if ignore_case:
         = 
    if whole_word:
         = 
 
    (old_text, new_text)
 
# Create an object of the PdfDocument classdoc = PdfDocument()
# Load PDF files("Test.pdf")
 
# traverse every page in the documentfor i in range():
    # Get the current page    page = [i]
 
    # Replace text with case-insensitive and full word matching    replace_text_with_options(page, "old_text", "new_text", ignore_case=True, whole_word=True)
 
# Save the modified PDF file("Other alternative conditions.pdf")
# Close the document to free up resources()

The above is all the content of replacing or modifying text in PDF using Python.

This is the article about using Python to implement the automatic replacement or modification function of PDF text. For more related Python PDF text replacement content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!