Using python to operate on pdf files
Read pdf-source code
import PyPDF2 # Read pdf format filesreader = ('Sample File/') print(reader) # Read the file of the specified pagepage = (0) # Output text data of the current pageprint(())
Read pdf-source code analysis
This code uses the PyPDF2 library to read and process PDF files. The following is a detailed analysis of this code:
1. Import the library
import PyPDF2
This line of code imports the PyPDF2 library, which provides the function of processing PDF files.
2. Read PDF files
reader = ('Sample File/') print(reader)
('Sample File/'): Creates a PdfFileReader object to read the specified PDF file. The file path here is 'Sample File/'.
print(reader): Print reader object. This object contains the metadata and page information of the PDF file.
3. Read the specified page
page = (0)
(0): Get the first page from the PDF file (indexed to 0). The getPage method returns a PageObject object representing a page in the PDF file.
4. Extract and output text data
print(())
(): Extract text data from the current page (page object). This method tries to parse the text in the page and return a string.
print(()): Print the extracted text data.
Code execution process
Import library: Import the PyPDF2 library.
Create a reader object: Use PdfFileReader to read the specified PDF file.
Get page object: Use the getPage method to get the first page of the PDF file.
Extract text: Use the extractText method to extract text data from the page object.
Output text: Print extracted text data.
Things to note
Make sure the PDF file path is correct and the file exists.
The PyPDF2 library may not perfectly extract text from all PDF files, especially those containing complex formats or images.
If the PDF file is password protected, you need to decrypt the file before you can read the content.
Sample output
Assuming the first page of the PDF file 'Sample File /' contains the text "Hello, World!", then the output of the code will be:
< object at 0x...>
Hello, World!
Where <object at 0x...> is the representation of the PageObject object, and the text behind is the extracted content.
Through this code, you can read the specified page of the PDF file and extract the text content.
Rotate and create blank pdf - source code
import PyPDF2 # Create an object that reads the pdf filereader = (r'Sample File/') # Create an object to write to a pdf filewriter = () # print(reader,writer) # Get all page numbers in the pdf file# print() # traverse all page numbers in the pdf filefor page_num in range(): # print(page_num) # Get the current page object current_page = (page_num) # If it is an odd page, rotate 90° clockwise if page_num % 2 == 0: current_page.rotateClockwise(90) else: # If it is an odd page, rotate 90° counterclockwise current_page.rotateCounterClockwise(90) (current_page) # Add blank page and rotate 90°page = () (90) # Save the adjustments made by the PDF file to a new file through the write method in the writer object.with open(r'Sample File/Rotate and Create Blank PDF File.pdf', 'wb') as file: (file)
Rotate and create blank pdf - source code parsing
This code uses the PyPDF2 library to read a PDF file, rotate each page, add a blank page, and finally save the modified content to a new PDF file. The following is a detailed analysis of this code:
1. Import the library
import PyPDF2
This line of code imports the PyPDF2 library, which provides the function of processing PDF files.
2. Create objects that read and write PDF files
reader = (r'Sample File/') writer = ()
reader = (r's example file/'): Creates a PdfFileReader object to read the specified PDF file. The file path here is 'Sample File/'.
writer = (): Create a PdfFileWriter object to write to a new PDF file.
3. Get all page numbers in PDF file
num_pages =
: Get the total number of pages in the PDF file.
4. Traverse all page numbers in PDF file
for page_num in range(num_pages): current_page = (page_num) if page_num % 2 == 0: current_page.rotateClockwise(90) else: current_page.rotateCounterClockwise(90) (current_page)
for page_num in range(num_pages): Iterate through each page in the PDF file.
current_page = (page_num): Get the page object corresponding to the current page number.
if page_num % 2 == 0: Determine whether the current page number is an even number.
current_page.rotateClockwise(90): If it is an even page, rotate 90 degrees clockwise.
current_page.rotateCounterClockwise(90): If it is an odd page, rotate 90 degrees counterclockwise.
(current_page): Add the rotated page to the writer object.
5. Add blank pages and rotate
page = () (90)
page = (): Add a blank page in the writer object.
(90): Rotate the blank page clockwise by 90 degrees.
6. Save the modified PDF file
with open(r'Sample File/Rotate and Create Blank PDF File.pdf', 'wb') as file: (file)
with open(r's sample file/rotate and create blank pdf file.pdf', 'wb') as file: Open a new file in binary writing mode to save the modified PDF file.
(file): Write contents from the writer object to a new file.
Code execution process
Import library: Import the PyPDF2 library.
Create Reader and Writer Objects: Create objects for reading and writing PDF files, respectively.
Get Total Pages: Get the Total Pages in a PDF file.
Traverse each page: Rotate each page and add the rotated page to the writer object.
Add blank page and rotate: Add a blank page in the writer object and rotate it 90 degrees.
Save file: Save the modified content to a new PDF file.
Sample output
Assuming that the original PDF file 's example file /' has 3 pages, after the above code processing, a new PDF file will be generated 's example file / rotate and create blank pdf file.pdf', where:
Page 1 (formerly page 1) rotates clockwise 90 degrees.
Page 2 (formerly page 2) rotates 90 degrees counterclockwise.
Page 3 (formerly page 3) rotates clockwise 90 degrees.
A blank page was added and the blank page was rotated clockwise by 90 degrees.
Through this code, you can rotate each page of the PDF file, add a blank page, and finally save the modified content to a new PDF file.
Encrypt PDF file - source code
import PyPDF2 # Create a pdf file objectreader = ('Sample File/Rotate and Create Blank PDF File.pdf') # Create an object to write to a pdf filewriter = () for page_num in range(): # Append each page of the original text to the writer object ((page_num)) # Set password for writer object("123456") # Write the encrypted file to a new filewith open(r'Sample File/Rotate and Create Blank PDF File_Encrypt.pdf', 'wb') as file: (file)
Encryption of pdf file - source code analysis
This code uses the PyPDF2 library to read an existing PDF file, copy its contents into a new PDF file, and set a password to the new PDF file for encryption, and finally save the encrypted file to the new file. The following is a detailed analysis of this code:
1. Import the library
import PyPDF2
This line of code imports the PyPDF2 library, which provides the function of processing PDF files.
2. Create a PDF file object
reader = ('Sample File/Rotate and Create Blank PDF File.pdf')
reader = ('Sample file/rotate and create blank pdf file.pdf'): Create a PdfFileReader object to read the specified PDF file. The file path here is 'Sample File/Rotate and Create Blank PDF File.pdf'.
3. Create an object to write to a PDF file
writer = ()
writer = (): Create a PdfFileWriter object to write to a new PDF file.
4. Iterate through each page of the original text and append it to the writer object
for page_num in range(): ((page_num))
for page_num in range(): Iterate through every page in the PDF file.
((page_num)): Add the page object corresponding to the current page number to the writer object.
5. Set password for writer object
("123456")
("123456"): Set a password to the writer object, the password is "123456". Encrypted PDF files need to be opened with this password.
6. Write the encrypted file to a new file
with open(r'Sample File/Rotate and Create Blank PDF File_Encrypt.pdf', 'wb') as file: (file)
with open(r's example file/rotate and create blank pdf file_encrypt.pdf', 'wb') as file: Open a new file in binary writing mode to save the encrypted PDF file.
(file): Write contents from the writer object to a new file.
Code execution process
Import library: Import the PyPDF2 library.
Create Reader Object: Create an object to read a PDF file.
Create Writer Object: Create an object for writing to a new PDF file.
Traversing each page: Append each page of the original text to the writer object.
Set password: Set password for the writer object.
Save file: Save the encrypted content to a new PDF file.
Sample output
Assume that the original PDF file 's sample file/rotate and create blank pdf file.pdf' has 3 pages, after the above code processing, a new PDF file will be generated 's sample file/rotate and create blank pdf file_encryption.pdf', where:
The content is the same as the original PDF file.
The file is encrypted and needs to use the password "123456" to open.
Through this code, you can read the contents of a PDF file, copy it to a new PDF file, set a password to encrypt the new PDF file, and finally save the encrypted file to the new file.
Add watermark to PDF file - source code
import PyPDF2 # Read source filesreader = (r'Sample File/') # Read the watermark filewater = (r'Sample File/') # Write to pdf file objectwriter = () # Get the watermark pagewater_page = (0) # Add watermark to the original file by loopingfor page_num in range(): # Get the current page object current_page = (page_num) # Merge each page of the source file with the watermark page current_page.mergePage(water_page) (current_page) # Write the file with the added watermark page to the new filewith open(r'Sample File/','wb') as file: (file)
Adding watermark to PDF file - Source code analysis
This code uses the PyPDF2 library to read a source PDF file and a watermark PDF file, then add the watermark to each page of the source file, and finally save the watermarked file to a new file. The following is a detailed analysis of this code:
1. Import the library
import PyPDF2
This line of code imports the PyPDF2 library, which provides the function of processing PDF files.
2. Read source files and watermark files
reader = (r'Sample File/') water = (r'Sample File/')
reader = (r's example file/'): Creates a PdfFileReader object to read the source PDF file. The file path here is 'Sample File/'.
water = (r's example file/'): Creates a PdfFileReader object to read a watermark PDF file. The file path here is 'Sample File/'.
3. Create an object to write to a PDF file
writer = ()
writer = (): Create a PdfFileWriter object to write to a new PDF file.
4. Get the watermark page
water_page = (0)
water_page = (0): Get the first page of the watermark PDF file (index 0) as the watermark page.
5. Add watermark to the original file by looping
for page_num in range(): current_page = (page_num) current_page.mergePage(water_page) (current_page)
for page_num in range: Iterate through each page in the source PDF file.
current_page = (page_num): Get the page object corresponding to the current page number.
current_page.mergePage(water_page): Merge the watermark page onto the current page.
(current_page): Adds the current page with the watermark merged to the writer object.
6. Write the file with the added watermark page to the new file
with open(r'Sample File/','wb') as file: (file)
with open(r's example file/','wb') as file: Open a new file in binary writing mode to save the PDF file with added watermarks.
(file): Write contents from the writer object to a new file.
Code execution process
Import library: Import the PyPDF2 library.
Read source file and watermark file: Create objects for reading source PDF file and watermark PDF file respectively.
Create Writer Object: Create an object for writing to a new PDF file.
Get the watermark page: Get the first page of the watermark PDF file.
Traverse each page and add a watermark: Merge the watermark page on each page of the source PDF file and add the merged page to the writer object.
Save file: Save the watermarked file to a new file.
Sample output
Suppose the source PDF file 's example file /' has 3 pages, and the watermark PDF file 's example file /' has 1 page. After the above code processing, a new PDF file 's example file /' will be generated, where:
Each page contains a watermark.
The location and size of the watermark depends on the content of the watermark page and the size of the source page.
Through this code, you can read a source PDF file and a watermark PDF file, add the watermark to each page of the source file, and finally save the watermarked file to a new file.
The above is the detailed content of using python to encrypt pdf files. For more information about python pdf encryption, please follow my other related articles!