Detailed Guide to Using PyPDF2 and ReportLab for Python

PDF processing tools in Python: PyPDF2 and ReportLab usage guide

PDF file processing is a common requirement in daily work and projects, whether it is merging reports, encrypting documents, filling forms, or generating invoices. There are many libraries in Python for manipulating PDF files, among which PyPDF2 and ReportLab are two widely used tools: the former is used for reading and modifying PDF documents, and the latter is used to generate PDF files from scratch. In this blog, we will cover how to use PyPDF2 and ReportLab to complete some common PDF processing tasks.

1. Install PyPDF2 and ReportLab

To get started with PyPDF2 and ReportLab, you need to install them first. The following commands can be executed in a terminal or in a command prompt:

pip install PyPDF2 reportlab

After installation, you can use them to read, write and generate PDFs.

2. Use PyPDF2 to operate PDF files

PyPDF2 is a powerful PDF processing library that provides multiple functions that allow us to read, merge, split, encrypt and decrypt PDF files. Here are some common operations in PyPDF2.

1. Read PDF files

First, let's see how to open and read the contents of a PDF file with PyPDF2.

from PyPDF2 import PdfReader

# Open PDF filereader = PdfReader("")

# Get the number of pagesnum_pages = len()
print(f"Total pages: {num_pages}")

# Read the content of each pagefor page_num in range(num_pages):
    page = [page_num]
    text = page.extract_text()
    print(f"Page {page_num + 1}:\n{text}")

In this example, we use the PdfReader class to open the PDF file and extract the text content of each page through the extract_text() method. This method is suitable for reading plain text content from PDFs, such as reports and documents.

2. Merge PDF files

Merging multiple PDF files is one of PyPDF2's strengths. Here is an example of combining two PDF files into one PDF file:

from PyPDF2 import PdfWriter, PdfReader

# Create PDF Writerwriter = PdfWriter()

# Read two PDF files and add their pages to the writerpdf_files = ["", ""]
for pdf_file in pdf_files:
    reader = PdfReader(pdf_file)
    for page in :
        writer.add_page(page)

# Save the merged PDF filewith open("merged_output.pdf", "wb") as output_pdf:
    (output_pdf)

In this example, we create aPdfWriterInstance, read each PDF file in turn and add its page to the writer. Finally, the merged PDF file will be saved asmerged_output.pdf。

3. Split PDF file

If you need to extract some pages in the PDF file, it can also be implemented through PyPDF2. For example, extract pages 1 to 3 from a PDF file:

from PyPDF2 import PdfWriter, PdfReader

reader = PdfReader("")
writer = PdfWriter()

# Extract specific pagesfor i in range(3):  # Here it means to extract page 1 to page 3    writer.add_page([i])

# Save the split filewith open("split_output.pdf", "wb") as output_pdf:
    (output_pdf)

This code willThe first 3 pages of extract and save assplit_output.pdf。

4. Encrypt and decrypt PDF files

For confidential files, PyPDF2 provides encryption and decryption capabilities. We can useencryptMethod to set password protection PDF file:

writer = PdfWriter()
reader = PdfReader("")

# Add all pagesfor page in :
    writer.add_page(page)

# Encrypt and set password("password123")

# Save encrypted fileswith open("encrypted_output.pdf", "wb") as output_pdf:
    (output_pdf)

In this example, the encrypted_output.pdf file can only be opened with the password "password123", ensuring the security of the file.

3. Use ReportLab to generate PDF files

ReportLab is another powerful PDF library suitable for generating PDF files from scratch and supports complex layouts and styling. ReportLab uses Canvas to draw PDF content to generate PDF files containing text, graphics, and tables.

1. Create a PDF file and add text

First, let's see how to create a simple PDF file using ReportLab and add text:

from  import A4
from  import canvas

# Create PDF filepdf_path = "generated_example.pdf"
pdf_canvas = (pdf_path, pagesize=A4)

# Add textpdf_canvas.drawString(100, 750, "Hello, ReportLab!")
pdf_canvas.drawString(100, 730, "This is a simple PDF file created using Python.")

# Save and close PDFpdf_canvas.save()
print(f"PDF saved as {pdf_path}")

In this code, the drawString method can specify the text position in the unit (pt), and the size of the A4 page is 595x842 pt. Write the text “Hello, ReportLab!” at 100, 750.

2. Add pictures and graphics

ReportLab allows images to be inserted into PDFs and can draw various shapes, which is useful for generating charts or reports with images.

from  import A4
from  import canvas

# Create PDF filepdf_path = "pdf_with_image.pdf"
pdf_canvas = (pdf_path, pagesize=A4)

# Add an imagepdf_canvas.drawImage("example_image.jpg", 100, 500, width=200, height=150)

# Draw a rectanglepdf_canvas.setStrokeColorRGB(0, 0, 1)  # Blue borderpdf_canvas.setFillColorRGB(0.8, 0.8, 1)  # Light blue fillpdf_canvas.rect(100, 450, 200, 100, fill=True)

# Save PDFpdf_canvas.save()
print(f"PDF with image and shapes saved as {pdf_path}")

Here we insert a picture and draw a blue rectangle with position at (100, 450) and a dimension of 200x100. The drawImage method can be used to insert image files and supports JPG and PNG formats.

3. Add a table

ReportLab's Table class makes it easy to create and format tables. The following example shows how to insert a table containing data in a PDF:

from  import A4
from  import canvas
from  import Table, TableStyle
from  import colors

# Create PDF filepdf_path = "pdf_with_table.pdf"
pdf_canvas = (pdf_path, pagesize=A4)

# Table datadata = [
    ["Product", "Price", "Quantity"],
    ["Widget", "$25.00", "10"],
    ["Gadget", "$15.00", "30"],
    ["Doohickey", "$5.00", "50"]
]

# Create a tabletable = Table(data)
(TableStyle([
    ("BACKGROUND", (0, 0), (-1, 0), ),
    ("TEXTCOLOR", (0, 0), (-1, 0), ),
    ("ALIGN", (0, 0), (-1, -1), "CENTER"),
    ("GRID", (0, 0), (-1, -1), 0.5, ),
    ("BACKGROUND", (0, 1), (-1, -1), ),
]))

# Add a table to PDF(pdf_canvas, 400, 300)
(pdf_canvas, 100, 600)

# Save PDFpdf_canvas.save()
print(f"PDF with table saved as {pdf_path}")

In this code, we create a table with product, price, and quantity information and style it including background color, alignment, and border lines.

4. Summary

PyPDF2 and ReportLab are two major tools for processing PDF files, each with their own strengths:

PyPDF2: Suitable for reading, merging, splitting and encrypting PDF files, mainly used to process existing PDF files.
ReportLab: Used to generate PDF files from scratch, with precise control of layout, suitable for creating invoices, reports, and other customized documents.

The combination of these two libraries can help us achieve comprehensive PDF processing requirements, from simple file merging to complex chart and table creation, which Python can easily do. I hope this guide can help you better understand how to use these two libraries and realize the automated processing of PDFs.

5. Comprehensive application: Generate invoice PDF example

Here, we use PyPDF2 and ReportLab to generate an invoice PDF containing company information, customer information, and project list. This kind of scenario is very common in practical applications.

1. Create an invoice template

First, we use ReportLab to create an invoice template fileinvoice_template.pdf, including company logo, invoice title and necessary form formats:

from  import A4
from  import canvas
from  import Table, TableStyle
from  import colors

def create_invoice_template():
    pdf_path = "invoice_template.pdf"
    pdf_canvas = (pdf_path, pagesize=A4)

    # Set page title    pdf_canvas.setFont("Helvetica-Bold", 16)
    pdf_canvas.drawString(220, 800, "Invoice")

    # Company Information    pdf_canvas.setFont("Helvetica", 12)
    pdf_canvas.drawString(50, 780, "Company Name: XYZ Ltd.")
    pdf_canvas.drawString(50, 765, "Address: 123 Example St., City")
    pdf_canvas.drawString(50, 750, "Phone: (123) 456-7890")
    pdf_canvas.drawString(50, 735, "Email: contact@")

    # Customer Information Section    pdf_canvas.drawString(50, 700, "Bill To:")
    pdf_canvas.drawString(50, 685, "Customer Name:")
    pdf_canvas.drawString(50, 670, "Customer Address:")

    # Add a table header    data = [["Item", "Description", "Quantity", "Unit Price", "Total"]]
    table = Table(data)
    (TableStyle([
        ("BACKGROUND", (0, 0), (-1, 0), ),
        ("TEXTCOLOR", (0, 0), (-1, 0), ),
        ("ALIGN", (0, 0), (-1, -1), "CENTER"),
        ("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
        ("FONTSIZE", (0, 0), (-1, 0), 12),
        ("BOTTOMPADDING", (0, 0), (-1, 0), 12),
        ("GRID", (0, 0), (-1, -1), 0.5, ),
    ]))
    (pdf_canvas, 450, 400)
    (pdf_canvas, 50, 600)

    # Save template    pdf_canvas.save()
    print(f"Invoice template saved as {pdf_path}")

# Generate templatescreate_invoice_template()

In this code, we set up the basic structure of the invoice, including the location where the company and customer information is displayed, and a form with a title to fill in the product or service details.

2. Use PyPDF2 to fill in customer information and project details

Next, we use PyPDF2 to fill in the customer information and project details on the generated template. We write customer information and project list to the invoice_filled.pdf file.

from PyPDF2 import PdfReader, PdfWriter
from  import A4
from  import canvas
from io import BytesIO

def fill_invoice(customer_name, customer_address, items):
    # Open the template    reader = PdfReader("invoice_template.pdf")
    writer = PdfWriter()

    # Create a memory buffer to draw overlay content    packet = BytesIO()
    pdf_canvas = (packet, pagesize=A4)

    # Fill in customer information    pdf_canvas.setFont("Helvetica", 12)
    pdf_canvas.drawString(150, 685, customer_name)
    pdf_canvas.drawString(150, 670, customer_address)

    # Fill in the project details    y = 580
    for item in items:
        pdf_canvas.drawString(50, y, item["item"])
        pdf_canvas.drawString(150, y, item["description"])
        pdf_canvas.drawString(250, y, str(item["quantity"]))
        pdf_canvas.drawString(350, y, f"${item['unit_price']:.2f}")
        pdf_canvas.drawString(450, y, f"${item['quantity'] * item['unit_price']:.2f}")
        y -= 20  # Adjust the y coordinates to ensure that each item is in a new line
    # Save the drawn content    pdf_canvas.save()

    # Merge overlays as new page content    (0)
    overlay = PdfReader(packet)
    for page in :
        page.merge_page([0])
        writer.add_page(page)

    # Save invoices with content    with open("invoice_filled.pdf", "wb") as output_pdf:
        (output_pdf)
    print("Invoice filled and saved as invoice_filled.pdf")

# Sample datacustomer_name = "John Doe"
customer_address = "456 Example Ave., City"
items = [
    {"item": "Widget", "description": "High-quality widget", "quantity": 5, "unit_price": 20.00},
    {"item": "Gadget", "description": "Advanced gadget", "quantity": 3, "unit_price": 35.00},
    {"item": "Doohickey", "description": "Multi-purpose tool", "quantity": 2, "unit_price": 15.50},
]

# Generate an invoicefill_invoice(customer_name, customer_address, items)

In this code, we use the fill_invoice function to fill the customer information and project details into the template of invoice_template.pdf and save it as invoice_filled.pdf. Each item details are filled in by line, including product name, description, quantity, unit price and total price.

6. Summary

In this tutorial, we learned how to use PyPDF2 and ReportLab to process PDF files, from reading and merging existing files, to generating and populating custom invoices from scratch. These technologies bring efficient solutions to PDF operations in daily work, making automated PDF processing possible.

With PyPDF2 and ReportLab, you can easily create automated scripts to generate PDF reports, process encrypted files containing sensitive data, or build a batch file processing system. I hope that through this blog, you can use these two libraries flexibly to improve the efficiency of PDF file processing.

The above is the detailed guide for using PyPDF2 and ReportLab to operate PDF files. For more information about the use of Python PyPDF2 and ReportLab, please follow my other related articles!