Python's operating guide for batch processing of PDF images (insert, compress, extract, replace, paging, rotate, delete)

1. Overview

Images are one of the core elements of PDF documents. They not only enhance the visual appeal of the document, but also effectively convey information and help readers better understand content and topics. In actual operations, we often need to perform a variety of processing on the pictures in PDF, including insertion, extraction, replacement, rotation, paging, compression, deletion, etc. Automating these operations through Python programming can not only improve work efficiency, but also reduce human error rates, which are especially suitable for scenarios where large-scale document processing is performed. This article will introduce in detail how to use Python to implement image insertion, extraction, replacement, compression, pagination, rotation and deletion in PDF.

2. Use tools

To implement PDF image processing in Python, a suitable PDF processing library is required. This article will use for Python, This library is mainly used to create, read, convert and edit PDF documents in Python applications.

Install

Before you start, you need to install the library first. You can run the following command in the terminal to install:

pip install

3. Insert pictures in Python in PDF

Inserting pictures can be divided into multiple scenarios, such as inserting pictures into existing PDFs, inserting pictures into new PDFs, inserting multiple pictures into PDFs in batches, etc. The following will introduce these scenarios one by one.

3.1 Insert the picture to the existing PDF

Implementation steps

usePdfDocumentClass loads existing PDF documents.
use[index]Properties get the target page to which the image is to be inserted.
useMethod loads the image to the PdfImage object.
use()Method draws the image object to the specified position of the page.
use()Method to save the result PDF document.

Implement code

The following Python code shows how to insert an image into an existing PDF document:

from  import *
 
# Open an existing PDF documentpdf = PdfDocument("Test.pdf")
 
# Get the first pagepage = [0]
 
# Loading picturesimage = ("C:/Users/Administrator/Desktop/")
 
# Specify the coordinates and size of the drawing image (if you need to insert according to the original image size, you do not specify the width and height)x, y, width, height = 50.0, 50.0, 200.0, 200.0
 
# Draw pictures at the specified location on the first page(image, x, y, width, height)
 
# Save the document as a new PDF file("Insert the picture to existing", )
()

3.2 Insert the picture into the new PDF

Implementation steps

usePdfDocumentThe class creates a new PDF document.
use()Methods Add a page to the newly created PDF document.
useMethod loads the image to the PdfImage object.
use()Method draws the image object to the specified position of the page.
use()Method to save the result PDF document.

Implement code

The following Python code shows how to create a new PDF document and insert an image:

from  import *
 
# Create PDF documentpdf = PdfDocument()
 
# Add a pagepage = ()
 
# Loading picturesimage = ("C:/Users/Administrator/Desktop/")
 
# Specify the coordinates and size of the drawing imagex, y, width, height = 10.0, 50.0, 200.0, 100.0
 
# Draw the picture at the specified position on the first page (If you need to insert according to the original image size, you do not specify the width and height)(image, x, y, width, height)
 
# Save the modified document("Insert the picture to create new", )
()

3.3 Batch insert multiple pictures into PDF

Implementation steps

Inserting pictures into PDF in batches requires traversing the image list and then drawing them onto the PDF page in turn. The implementation steps are similar to the above steps and will not be described in detail here.

Implement code

The following Python code shows how to batch insert multiple images into a new PDF document:

from  import *
 
def batch_insert_images(image_paths, positions):
    # Create a new PDF document    pdf = PdfDocument()
 
    # traverse the image file list    for img_path, pos in zip(image_paths, positions):
        # Add a new page        page = ()
        
        # Loading pictures        image = (img_path)
        
        # Draw pictures on the page to the specified location        (image, *pos)
 
    # Save PDF document    ("Batch insert picture.pdf", )
    ()
 
# Callimage_paths = [
    "C:/Users/Administrator/Desktop/",
    "C:/Users/Administrator/Desktop/",
    "C:/Users/Administrator/Desktop/",
]
positions = [
    (0, 0),  # Location of the first image    (0, 0),  # Location of the second image    (0, 0),  # Location of the third picture]
 
batch_insert_images(image_paths, positions)

4. Python extracts PDF images and their metadata

Implementation steps

usePdfDocumentClass loads PDF documents.
Iterates through all pages of the document.
- Get the collection of picture information for the current page.
- A collection of image information that traverses the page.
  - Get the location information and size of each image.
  - Save the image to the specified path and print its metadata, such as the page number, picture number, location, size, save location, etc.

Implement code

The following Python code shows how to extract images and their metadata in PDFs, such as size, location, and page number:

from  import *
import os
 
# Open an existing PDF documentpdf = PdfDocument("Test.pdf")
 
# Create a directory that saves picturesoutput_dir = "Extracted Pictures"
(output_dir, exist_ok=True)
 
# traverse all pagesfor page_index in range():
    page = [page_index]
    imageInfo = 
 
    # Extract all pictures on the page    for i in range(len(imageInfo)):
        # Get the location information and size of the picture        bounds = imageInfo[i].Bounds
        width = 
        height = 
        
        # Build the file path to save the image        file_path = (output_dir, f"page{page_index + 1}_image_{i + 1}.png")
        
        # Save the image to the specified path        imageInfo[i].(file_path)
 
        # Print detailed information about the picture        print({
            "page": page_index + 1,
            "picture": i + 1,
            "Width and Height": f"{width}x{height}",
            "Location": (, ),
            "Save Location": file_path
        })
 
# Close the document()

5. Python replace PDF pictures

Replacing PDF images can be divided into two scenarios. One is to replace the pictures in the document with a new picture, and the other is to replace the pictures with text. The following will introduce these two replacement scenarios one by one.

5.1 Replace pictures with pictures

Implementation steps

usePdfDocumentClass loads PDF documents.
use[index]Properties get the target page to replace the image.
use()Method Replace the specified image in the page with a new image. Using this method, the replaced image will maintain the size and position of the original image.
use()Method to save the result PDF document.

Implement code

The following Python code shows how to replace a specified image in a PDF with a new image:

from  import *
 
# Open an existing PDF documentpdf = PdfDocument("Test.pdf")
 
# Get the first pagepage = [0]
 
# Load new imagenew_image = ("C:/Users/Administrator/Desktop/")
 
# Replace the first image on the page with a new image(0, new_image)
 
# Save the modified document("Picture replacement picture.pdf")
()

5.2 Replace pictures with text

Implementation steps

usePdfDocumentClass loads PDF documents.
use[index]Properties get the target page to replace the image.
useAttributes to get the picture information of the page.
Get the image stream data of the first image from the image information collection and load it intoPdfImageObject.
Gets the size and coordinate position of the image object.
Remove the image from the page.
Draw text to image location.
use()Method saves the result document.

Implementation code:

from  import *
 
# Open an existing PDF documentpdf = PdfDocument("Test.pdf")
 
# Get the first page of PDFpage = [0]
 
# Get picture information from the pageimageInfo = 
 
# Extract the first image from the pageimage = (imageInfo[0].Image)
 
# Get the size of the image (in pixels)widthInPixel = 
heightInPixel = 
 
# Convert pixel values to point valuesconvertor = PdfUnitConvertor()
width = (float(widthInPixel), )
height = (float(heightInPixel), )
 
# Get the coordinates of the imagex = imageInfo[0].
y = imageInfo[0].
 
# Delete pictures from page(0)
 
# Specify the rectangular area where the text is drawn (same position and size as the picture)rect = RectangleF(PointF(x, y), SizeF(width, height))
 
# Set text alignmentstrformat = PdfStringFormat()
 = 
 = 
 
# Draw text to image location("Replace text", PdfFont(, 18.0), PdfBrushes.get_Purple(), rect, strformat)
 
# Save the modified PDF("Text Replace Picture.pdf")
()

6. Python implements PDF image pagination layout

Implementation steps

When inserting pictures into PDF pages, you may encounter the situation where the pictures are too large and one page cannot be displayed. In this case, the endless display needs to be drawn to the next page to ensure it is fully visible. The specific implementation steps are as follows:

usePdfDocumentCreate a new PDF document in the class.
Add a page to the document.
usePdfTextLayoutClass settings pagination layout.
use()Method, draw pictures on pages in pagination layout.
use()Method to save the result PDF document.

Sample code

The following Python code shows how to add a large image to a PDF and make it paginated:

from  import *
 
# Create PDF documentpdf = PdfDocument()
 
# Add a pagepage = ()
 
# Loading picturesimage = ("")
 
# Set layout options to make pictures pagingformat = PdfTextLayout()
 = 
 = 
 
# Draw pictures on pages using paging layout(page, 20.0, 600.0, format)
 
# Save PDF document("Picture Pagination.pdf")
()

7. Python settings PDF image transparency and rotation

Implementation steps

usePdfDocumentCreate a new PDF document in the class.
use()Method Add a page to the document.
use()Method saves the initial state of the page canvas.
use()Methods Apply transparency to the page canvas.
use()Method moves the coordinate system of the page to the position where the picture is drawn and rotates a specific angle.
useMethod loads the image to the PdfImage object.
use()Method draws the image object to the specified position of the page.
use()Method to restore the status of the page canvas.
use()Method to save the result PDF document.

Implement code

The following Python code shows how to set the transparency and rotation angle of an image in a PDF:

from  import *
from  import *
 
# Create PDF documentpdf = PdfDocument()
 
# Add a pagepage = ()
 
# Specify the location and size of the picture to be drawnx, y, width, height = 50.0, 200.0, 200.0, 100.0
 
# Save the status of the page canvasstate = ()
 
# Apply transparency to page canvas(0.5)
 
# Move the coordinate system to the specific coordinates of the drawing picture and rotate the page canvas 45 degrees counterclockwise(-45.0, PointF(x, y))
 
# Loading picturesimage = ("C:/Users/Administrator/Desktop/")
 
# Draw pictures at the specified location of the page(image, x, y, width, height)
 
# Restore the status of the page canvas(state)
 
# Save PDF document("Set picture transparency and rotation.pdf")
()

8. Python compression PDF pictures

Implementation steps

usePdfCompressorClass opens PDF document.
Set image compression options.
use()Method compresses PDF and saves.

Sample code

The following Python code shows how to compress images in PDF documents:

from  import *
 
# Use the PdfCompressor class to open an existing PDFcompressor = PdfCompressor("Test.pdf")
 
# Set image compression optionscompression_options = 
compression_options.SetImageQuality()
compression_options.SetResizeImages(True)
compression_options.SetIsCompressImage(True)
 
# Compress the pictures in the PDF file and save the results to a new file("Compressed Image.pdf")

9. Python delete PDF images

Implementation steps

usePdfDocumentClass opens PDF document.
Iterate through all pages in the document.
useAttributes get the collection of picture information for the current page.
Iterates through all pictures in the collection.
- use()Method to delete the current image from the page.
use()Method saves the modified PDF document.

Sample code

The following Python code shows how to delete images from PDF documents:

from  import *
 
# Open an existing PDF documentpdf = PdfDocument("")
 
# traverse all pagesfor page_index in range():
    page = [page_index]
 
    imageInfo = 
 
    # Delete all pictures on the page    for i in range(len(imageInfo) - 1, -1, -1):
        (imageInfo[i])
 
# Save the document("Delete picture.pdf")
()

The above is all the content of using Python to process PDF images.

This is the article about this article on the operation guide (insert, compress, extract, replace, paging, rotate, delete) for batch processing of PDF images in Python. For more information about Python processing of PDF images, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!