SoFunction
Updated on 2025-03-10

Python code examples for PDF operation using PyPDF

1. Why choose PyPDF?

PyPDFIt is a lightweight and powerful PDF operation library that supports the following functions:

  • Merge and split PDF files
  • Extract text and meta information
  • Add or modify the metadata of a document
  • Encryption and decryption PDF
  • Customize PDF page rotation or cropping

The following are detailed implementations of some practical scenarios.

2. Install PyPDF

First, you need to install the PyPDF library. You can use pip:

pip install pypdf

Make sure you are installing the latest version for the latest features and performance improvements.

3. Merge and split PDF files

3.1 Merge PDF files

Merging multiple PDF files is useful when generating reports or organizing documents.

from pypdf import PdfMerger

# Initialize the mergermerger = PdfMerger()

# Add PDF files that need to be merged("")
("")

# Save the merged file("")
()
print("PDF merge is complete!")

3.2 Split PDF files

Split a PDF file into multiple independent page files.

from pypdf import PdfReader, PdfWriter

# Read PDF filesreader = PdfReader("")

# Split every pagefor i, page in enumerate():
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i+1}.pdf", "wb") as output_file:
        (output_file)
print("PDF split is completed!")

4. Extract PDF text

Extract text content from PDF files, which can be used for data analysis or automated processing.

from pypdf import PdfReader

# Read PDF filesreader = PdfReader("")

# Extract text from each pagefor page in :
    print(page.extract_text())

Things to note

  • The effect of text extraction depends on the structure of the PDF. If the text in the PDF is stored in an image, the text cannot be extracted directly.

5. Modify PDF meta information

Modify the metadata of the PDF, such as title, author, etc.

from pypdf import PdfReader, PdfWriter

reader = PdfReader("")
writer = PdfWriter()

# Copy all pages to new PDFwriter.add_pages()

# Modify meta information = {
    "/Title": "New Title",
    "/Author": "Author's name",
    "/Subject": "Theme Description"
}

with open("", "wb") as output_file:
    (output_file)
print("The meta-information modification is completed!")

6. PDF encryption and decryption

6.1 Encryption PDF

Add password protection to PDF files.

from pypdf import PdfWriter

writer = PdfWriter()
("")

# Set password(user_password="user123", owner_password="owner123")

with open("", "wb") as output_file:
    (output_file)
print("PDF encryption is complete!")

6.2 Decryption PDF

Decrypt password-protected PDF files.

from pypdf import PdfReader

reader = PdfReader("")

# Provide password decryption("user123")

for page in :
    print(page.extract_text())

7. Page rotation and cropping

7.1 Rotate the page

Rotate the page of the PDF, for example, converting a landscape page to portrait.

from pypdf import PdfReader, PdfWriter

reader = PdfReader("")
writer = PdfWriter()

# Rotate each pagefor page in :
    (90)  # Rotate clockwise 90 degrees    writer.add_page(page)

with open("", "wb") as output_file:
    (output_file)
print("Page rotation is complete!")

7.2 Crop the page

Crop the page border to remove unnecessary content.

from pypdf import PdfReader, PdfWriter

reader = PdfReader("")
writer = PdfWriter()

for page in :
    # Set the crop box (left, bottom, right, top)    .lower_left = (50, 50)
    .upper_right = (500, 700)
    writer.add_page(page)

with open("", "wb") as output_file:
    (output_file)
print("Page cropping is complete!")

8. Summary of practical experience

  1. Handle exceptions: In actual operation, ensure that exceptions during file reading, writing or parsing are captured, such as file not exist or decryption failure.
  2. Test PDF files: Due to the diversity of PDF file formats, sample files need to be tested before batch processing.
  3. Performance optimization: For large files, use batch loading to process.
  4. Security: Avoid hard-code sensitive information, such as passwords, in code.

This is the article about the code examples of Python using PyPDF for PDF operation. For more related content for Python PyPDF for PDF operation, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!