SoFunction
Updated on 2025-03-03

Summary of Word file automation operations in Python

Introduction to Python-docx

Next, we need to select and install Python libraries for Word file processing. One of the classic choices is "python-docx", which provides us with powerful features.

Python-docx is a Python library that provides read, write and modify functions to Microsoft Word (.docx files). It allows us to open, read, and edit Word documents, as well as create new Word documents using Python scripts. The Python-docx library provides a simple and easy-to-use API, making it simple and efficient to handle Word documents.

Through Python-docx, we can operate on paragraphs, titles, tables, images, etc. in Word documents. It allows us to change text styles, set page layouts, add pictures, insert tables, and more. The library also supports batch processing of multiple documents, including merge, split, replace text and style operations.

The installation of the Python-docx library is very simple, just use the pip command to install it. Once the library is installed, we can use some simple code to read, modify, and create Word documents, allowing us to flexibly process and customize Word files.

Overall, Python-docx is a powerful and easy-to-use Python library that makes processing Word files easy and fun. Whether it is batch processing or specific operations for a single document, Python-docx provides us with powerful features and flexible interfaces. As a Python programmer, using the Python-docx library can help us better manage and operate Word files and improve office efficiency.

Installing this library is very simple, only a few lines of commands are required. Let me use a code example to demonstrate it to you:

pip install python-docx

OK, you have installed the python-docx library, let's start playing now!

Read and modify Word files

In this section, we will learn how to open and read Word documents using the Python library and perform a series of interesting operations such as manipulating paragraphs, text styles and formats, and adding, removing, or replacing text content. Let's start this fun adventure!

First, let me briefly introduce what we are going to do. Reading and modifying Word files can help us get information from documents and adjust, customize, or update them. This way, we can make subtle changes as needed without changing the entire document structure. In short, we can do this with Python scripts, just like a magician incorporating a Word file!

Read Word files

Now, let's explore how to open and read Word documents. To do these we will use the "python-docx" library from the Python library. This library provides us with many features to easily read and process Word documents. Let me show you a code example of how to open a document:

from docx import Document

# To read Word documents, you need to replace them with your local file pathdocument = Document('')

# Print the content of the display documentfor paragraph in :
    print()

In the above code, we first importDocumentClass anddocxmodule, then useDocument('')Open a Word document named "" and assign it todocumentvariable.

Next, we use a simple for loop to iterate through each paragraph in the document and usePrint out the text content for each paragraph.

Make sure you replace the "" in your code with the file name of the Word document you are actually using. After running the code, you will see the console print out the text content of each paragraph in the document.

Well, it's very simple, right? With just a few lines of code, we can open a Word document.

Operation paragraphs, text styles and formats

Next, let's explore how to manipulate paragraphs, text styles, and formats. In Word documents, paragraphs are basic text units, and we can do some interesting things by manipulating paragraphs. Let me give you an example, let's set the second paragraph to bold:

from docx import Document
from  import RGBColor, Pt

# Open Word documentdocument = Document('')

# Get the first paragraphparagraph = [0]

# Modify paragraph text stylerun = paragraph.add_run("Hello, World!")
 = Pt(20)
 = True
 = RGBColor(255, 0, 0)


# Write the modified content back to the original file('')

# Display document contentfor paragraph in :
    print()

This is a simple example of how we manipulate paragraphs, text styles, and formats.

This code uses ython'sdocxThe library is to open a name calledWord documents.

It then takes the first paragraph in the document and styles it. Code by creating arunObject, add text to paragraph"Hello, World!". Then, it is setrunThe object's properties, changing the font size of the text to 20 pounds, bold, and the font color to red.

Next, the code saves the modified content back to the original file. Finally, the code goes through each paragraph in the document and usesprint()The function prints out the text content of each paragraph. The purpose of this code is to open a specific Word document, modify the text style of the first paragraph, then save the modified content back to the original file, and print each paragraph in the display document.

Add, delete or replace text content

Next, let's learn how to add, delete, or replace text content. This operation can help us customize the content of each Word file and adjust it according to our needs. Let me demonstrate some code examples to you:

from docx import Document

def modify_document(file_path):
    document = Document(file_path)

    # Add text    document.add_paragraph('This is a new paragraph one.  ')
    document.add_paragraph('This is a new paragraph two.  ')
    document.add_paragraph('This is a new paragraph three: the old text will be replaced.  ')

    # Delete text    [0].text = ""

    # Replace text    for paragraph in :
         = ("Old text", "New Text")

    # Write the modified content back to the original file    (file_path)

# Call function to modify the documentmodify_document('')

This code is opened withWord documents and modify them. The specific modification operations are as follows:

  • useadd_paragraphThe method adds three new paragraphs, namely "This is a new paragraph one.", "This is a new paragraph two." and "This is a new paragraph three: the old text will be replaced."
  • The text of the first paragraph is deleted by setting the text content of the first paragraph to an empty string.
  • usereplaceMethod, replace the part containing "old text" with "new text" in all paragraphs.
  • Finally, save the modified content back to the original file.

Please note that this code assumesThere are at least three paragraphs in the file. The operations of adding, deleting, and replacing text in the code are based on this assumption. Ultimately, you can verify modifications to the document by looking at the code output.

Using Python libraries, reading and modifying Word files has become so easy and fun. By manipulating paragraphs, text styles and formats, as well as adding, deleting or replacing text content, we can completely customize Word files to our needs. This is the charm of Python as a magic tool!

Hopefully, these funny sample codes help you learn how to read and modify Word files easily. Remember, as a magician, use Python as your magic wand to make your Word files more flexible and fun!

Create and edit Word documents

Python is equally powerful when it comes to creating and editing Word documents! We can use Python to create brand new Word documents and flexibly set page layout, header and page number. You can also add titles, paragraphs, images and other contents to adjust the font style and format. Isn't it very magical? Let me show you a code example:

Of course, I would love to be your Python teacher to explain the knowledge points of creating and editing Word documents! Let's explore together~

Create a new Word document using Python

To create a new Word document using Python, we need to usepython-docxlibrary. Here is a sample code to create a new document and save it:

from docx import Document
from  import Inches

# Create a new documentdocument = Document()

document.add_heading('Welcome to Word automation', level=1)
document.add_paragraph('This is a new paragraph.  ')
document.add_picture('', width=Inches(1.25))

# Save the file('New Document.docx')

This code is used to create a new Word document and add a title, paragraph, and an image to the document. First, we import the Document class and the Inches object, which come from docx and modules. Then, we create a new document object called document by calling the Document class. Next, by using the add_heading method, we added a title with level 1 to the document, with the title content "Welcome to Word Automation". Then, using the add_paragraph method, we added a new paragraph with the content of the paragraph "This is a new paragraph". Finally, by calling the add_picture method, we add a picture named "" and set the width of the picture to 1.25 inches. Finally, we use the save method to save the modified document as "new document.docx".

Set page layout, header, footer, and page number

To set page layout, header, footer, and page number, we can usepython-docxThe library provides different functions. Here is a sample code for setting page layout, header, footer, and page number:

from docx import Document
from  import WD_PARAGRAPH_ALIGNMENT
from  import Inches
from  import nsdecls
from  import parse_xml
from  import WD_ORIENT

def set_layout_header_footer(file_path):
    document = Document(file_path)

    # Set page layout    sections = 
    for section in sections:
         = WD_ORIENT.LANDSCAPE
        section.page_width = Inches(11)
        section.page_height = Inches(8.5)

    # Set the header    for section in :
        header = 
        header_paragraph = [0]
        header_paragraph.text = "This is the header"

    # Set footer    for section in :
        footer = 
        footer_paragraph = [0]
        footer_paragraph.text = "This is the footer"

    # Set page number    for section in :
        footer = 
        footer.is_linked_to_previous = False
        footer_page_num_paragraph = footer.add_paragraph()
        footer_page_num_paragraph.text = "page number:"
        footer_page_num_paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER
        footer_page_num_run = footer_page_num_paragraph.add_run()
        fld_simple = parse_xml(
            r'<w:fldSimple xmlns:w="/wordprocessingml/2006/main" w:instr="PAGE"><w:r><w:t>1</w:t></w:r></w:fldSimple>')
        footer_page_num_run._r.append(fld_simple)

    (file_path)

# Call function to set page layout, header, footer and page numberset_layout_header_footer('New Document.docx')

This code is used to set the page layout, header, footer, and page number of a Word document. Let's understand each part of this code together:

1. Import the required modules and classes:

  • from docx import Document: ImportDocumentClass, used to create and manipulate Word documents.
  • from import WD_PARAGRAPH_ALIGNMENT: ImportWD_PARAGRAPH_ALIGNMENTEnumeration class, used to set paragraph alignment.
  • from import Inches: ImportInchesClass, used to set length units to inches.
  • from import nsdecls: ImportnsdeclsConstant, used to create a namespace dictionary from an XML namespace declaration.
  • from import parse_xml: Importparse_xmlFunctions for parsing XML fragments.
  • from import WD_ORIENT: ImportWD_ORIENTEnumeration class, used to set paragraph direction.

2. Definitionset_layout_header_footerFunction, accepts a file path parameterfile_path

3. Create aDocumentInstance, loads an existing Word document by passing in the file path.

4. Set page layout:

Iterate through all sections and set the page direction to landscape (WD_ORIENT.LANDSCAPE), the page width is set to 11 inches (Inches(11)), the page height is set to 8.5 inches (Inches(8.5))。

5. Set the header:

Iterate through all sections, get and set the first paragraph as header text, and set its text to "This is header".

6. Set the footer:

Iterate through all sections, get and set the first paragraph as footer text, and set its text to "This is the footer".

7. Set page number:

  • Iterate through all sections and disconnect the footer from the previous section's footer (footer.is_linked_to_previous = False)。
  • Create a new paragraph and set the text to "Page:", centered and aligned.
  • Create a run object in the new paragraph and add the XML snippet of the page number field to the run object. The XML fragment contains a simple field (w:fldSimple), used to display page numbers.

8. Use(file_path)Save the modified document.

Finally, by callingset_layout_header_footerFunction and pass in file path'New Document.docx', set the application page layout, header, footer, and page number to the document. Make sure that the function has been created before calling it'New Document.docx'File or provided the correct file path.

Add titles, paragraphs, images, etc.

To add content like titles, paragraphs, and images, we can usepython-docxFeatures provided by the library. Here is a sample code that adds titles, paragraphs, images, etc.:

from docx import Document
from  import Pt, Inches

def add_content(file_path):
    document = Document(file_path)

    # Add a title    title = document.add_heading('This is a title', level=1)

    # Add paragraph    paragraph1 = document.add_paragraph('This is the first paragraph.  ')
    paragraph2 = document.add_paragraph('This is the second paragraph.  ')

    # Set font style and format    title_run = [0]
    title_run.bold = True
    title_run.italic = True
    title_run. = Pt(18)

    paragraph1_run = [0]
    paragraph1_run.bold = True
    paragraph1_run.italic = False
    paragraph1_run. = Pt(12)
    
    paragraph2_run = [0]
    paragraph2_run.bold = False
    paragraph2_run.italic = True
    paragraph2_run. = Pt(10)

    # Add an image    document.add_picture('', width=Inches(4), height=Inches(3))

    (file_path)

# Call function to add title, paragraph, image and other contentadd_content('New Document.docx')

This code is used to add titles, paragraphs, and images to a Word document. Let's understand each part of this code:

1. Import the required modules and classes:

  • from docx import Document: ImportDocumentClass, used to create and manipulate Word documents.
  • from import Pt, Inches: ImportPtandInchesClass, used to set the font size and image size respectively.

2. Definitionadd_contentFunction, accepts a file path parameterfile_path

3. Create aDocumentInstance, loads an existing Word document by passing in the file path.

4. Usedocument.add_headingMethod adds a title with the text "This is a title" and the level is 1.

5. Usedocument.add_paragraphThe method adds two paragraphs, namely "This is the first paragraph." and "This is the second paragraph."

6. Set the font style and format of titles and paragraphs:

  • Get the first run object of the title and bold, italicized, and set the font size to 18 pounds.
  • Gets the first run object for the first paragraph and bold, unitalicized, and sets the font size to 12 pounds.
  • Get the first run object for the second paragraph and set it without bold, italicization, and the font size is set to 10 pounds.

7. Usedocument.add_pictureMethod to add an image, the image file path is'', width is set to 4 inches and height is set to 3 inches.

8. Use(file_path)Save the modified document to the specified file path.

Finally, by callingadd_contentFunction and pass in file path'New Document.docx', apply content such as title, paragraph, and images to the document. Make sure that the function has been created before calling it'New Document.docx'File or provided the correct file path.

Adjust font style and format

To adjust the style and format of the font, we can usepython-docxIn the libraryFontObject. Here is a sample code to adjust the font style and format:

from docx import Document
from  import Pt
from  import WD_PARAGRAPH_ALIGNMENT

def adjust_font(file_path):
    document = Document(file_path)

    paragraph = document.add_paragraph('This is a paragraph.  ')

    run = [0]
     = 'This is an example of adjusting font style and formatting.  '
     = 'Song style'
     = Pt(12)
     = True
     = True
     = True

     = WD_PARAGRAPH_ALIGNMENT.CENTER

    (file_path)

# Call function to adjust font style and formatadjust_font('New Document.docx')

In this code, we useadd_paragraphMethod adds a paragraph and usesruns[0]Get the first one in the paragraphRunObject.

Then, we modifyRunThe object'sfontProperties to adjust the style and format of the font. For example, we set the font name asSong style, font size is 12 pounds, bold, italic and underlined.

We also modified the paragraphalignmentProperties center the paragraph.

This is just some basic examples where you can further explore and adjust other styles and formats of the font as needed.

Batch processing of Word files

With the help of Python, we can also batch process Word files. We can batch read and process multiple Word documents, merge and split files, and even batch replace text and styles. Here is a simple code example to demonstrate the magic of batch processing:

First, we need to install and importpython-docxModule, it is a powerful library for manipulating Word documents.

pip install python-docx

Now let's start with the code example as you request:

Batch reading and processing of multiple Word documents

  • First, we need to get a list of file paths for the Word document. Assume that these files are stored in the same folder.
  • useDocumentThe class reads and processes each Word document one by one, and you can perform custom processing operations on each document in a loop.

This is a sample code that shows how to read and process multiple Word documents in batches. In this example, I will use a simple way to print only the title and paragraph content of each document.

from docx import Document

folder_path = 'path/to/your/folder'  # Replace with the actual folder path
def process_document(file_path):
    document = Document(file_path)
    print(f"Processing documents:{file_path}")
    
    # Print title    for title in :
        if ().startswith('heading'):
            print(f"title: {}")
            break  # Just print the first title
    # Print paragraph    for paragraph in :
        print(f"paragraph: {}")
    
    print()  # Used to distinguish output between different documents
# traverse all Word documents in the folder and process themimport os

for file_name in (folder_path):
    if file_name.endswith('.docx'):
        file_path = (folder_path, file_name)
        process_document(file_path)

The purpose of this code is to batch process multiple Word documents in a specified folder. Let me explain the function of the code line by line:

1. Import the required modules:

from docx import Document: ImportDocumentClass, used to create and manipulate Word documents.

2. Definitionprocess_documentFunction, accepts a file path parameterfile_path

3. Create aDocumentInstance, loading Word documents by passing in file path.

4. Print the file path of the document currently being processed.

5. UseIterates through all paragraphs in the document.

  • use().startswith('heading')Check if the paragraph style begins with "heading" to determine whether it is a title.
  • If it is a title, print the title content and usebreakThe statement stops looking for other titles.

6. UseIterate through all paragraphs in the document again.

Print the contents of each paragraph.

7. Print an empty line to distinguish the output between different documents.

8. Import the necessary modules:

import os: ImportosModule for processing files and folders.

9. UseThe function traverses all files and folders in the given folder path.

10. Usefile_name.endswith('.docx')Check that the file name ends with ".docx" to determine if it is a Word document file.

11. UseThe function splices the file name with the folder path to get the complete file path.

12. Callprocess_documentfunction, pass the file path as an argument to it to process the document.

The logic of this code is that it traverses all Word documents in the specified folder and processes each document. The way to process is to print the title and paragraph content of each document.

Please make sure to'path/to/your/folder'Replace with your actual folder path and make sure that the folder contains the Word document in the correct format to be processed.

Merge and split multiple documents

  • Merge multiple Word documents: UseDocumentThe class reads each document and usesadd_documentMethod merges them into a new document.
  • Split a Word document into multiple documents:DocumentThe class loads the original document, then splits it into multiple parts using methods such as slices and saves it as different files.

Here is a sample code to demonstrate how to merge and split multiple Word documents:

from docx import Document

# Merge multiple documentsdef merge_documents(file_paths, output_path):
    merged_document = Document()
    
    for file_path in file_paths:
        document = Document(file_path)
        for element in :
            merged_document.(element)
    
    merged_document.save(output_path)
    print("Document merge is completed!")

# Split a document into multiple partsdef split_document(source_path):
    document = Document(source_path)
    
    for i, section in enumerate():
        new_document = Document()
        
        for element in section._sectPr:
            new_document._element.append(element)
        
        output_path = f"output_{i}.docx"
        new_document.save(output_path)
        print(f"Saved split document:{output_path}")

# Merge Document Examplefile_paths = ['path/to/your/', 'path/to/your/', 'path/to/your/']
merge_documents(file_paths, 'merged_documents.docx')

# Split Document Examplesplit_document('path/to/your/source_document.docx')

Batch replacement text and styles

  • useDocumentThe class loads each Word document and then uses a replacement function (for example,replace()) Replace the text in the document.
  • userunsProperties can change the style of the running object to implement batch replacement of styles.

Here is a sample code that shows how to batch replace text and styles in Word documents:

from docx import Document

def replace_text_and_style(file_path, old_text, new_text, old_style, new_style):
    document = Document(file_path)
    
    for paragraph in :
        if old_text in :
             = (old_text, new_text)
        
        for run in :
            if  and old_style in :
                 = new_style
    
    (file_path)
    print(f"Replacement is completed:{file_path}")

# Replace text and style examplesreplace_text_and_style('path/to/your/', 'Old title', 'New Title', 'Bold', 'Heading 1')

Hope these code samples are helpful to you! I tried to explain it in a light and humorous way and gave some interesting examples to better understand.

Friends, it's amazing, right?

Process tables and images in Word documents

Let's take a look at working on tables and images in Word documents. Python can also help you! You can read and edit table data in documents, add and delete tables, and insert, adjust and delete images.

Working with tables and images is one of the important tasks in operating in Word documents. Let me introduce some theoretical definitions to you and give a humorous example of the code for each operation:

Add and delete tables

In Word documents, you can add and delete tables to suit your needs. This allows easy organization and presentation of data.

from docx import Document

# Add a tabledef add_table(file_path, rows, cols):
    document = Document(file_path)
    table = document.add_table(rows, cols)

    # Add table content    for row in :
        for cell in :
             = "Cell"

    (file_path)
    print("The form was added successfully!")

# Delete the formdef delete_table(file_path, table_index):
    document = Document(file_path)
    tables = 

    if table_index < len(tables):
        table = tables[table_index]
        table._element.getparent().remove(table._element)
        (file_path)
        print("The form was deleted successfully!")
    else:
        print("Table index is out of range!")

# Example: Add a 2 rows and 3 columns table to the documentadd_table('', 2, 3)
print("The form was added successfully!")

# Example: Delete the first table in the documentdelete_table('', 0)
print("The form was deleted successfully!")

Read and edit table data

Tables are ordered collections used to organize and render data. You can read and edit data in a table, such as modifying cell content, adjusting table styles, etc.

from docx import Document

# Read table datadef read_table_data(file_path, table_index, row_index, col_index):
    document = Document(file_path)
    tables = 

    if table_index < len(tables):
        table = tables[table_index]
        cell_value = (row_index, col_index).text
        return cell_value
    else:
        print("Table index is out of range!")
        return None

# Edit table datadef edit_table_data(file_path, table_index, row_index, col_index, new_value):
    document = Document(file_path)
    tables = 

    if table_index < len(tables):
        table = tables[table_index]
        (row_index, col_index).text = new_value
        (file_path)
        print("Table data has been updated!")
    else:
        print("Table index is out of range!")

# Example: Read data from the second row and third column in the tabledata = read_table_data('', 0, 1, 2)
if data is not None:
    print("Data in the table:", data)

# Example: Modify the data in the third row and fourth column of the table to "new data"edit_table_data('', 0, 1, 2, "New Data")

Insert, adjust and delete images

In Word documents, you can insert, adjust, and delete images to make the document more vivid and attractive.

from docx import Document
from  import Inches

# Insert imagedef insert_image(file_path, image_path):
    document = Document(file_path)
    document.add_picture(image_path, width=Inches(2), height=Inches(3))
    (file_path)
    print("Image insertion succeeded!")

# Resize imagedef resize_image(file_path, image_index, width, height):
    document = Document(file_path)
    images = document.inline_shapes

    if image_index < len(images):
        image = images[image_index]
         = Inches(width)
         = Inches(height)
        (file_path)
        print("Image size has been adjusted!")
    else:
        print("Image index is out of range!")

# Delete the imagedef delete_image(file_path, image_index):
    document = Document(file_path)
    images = document.inline_shapes

    if image_index < len(images):
        image = images[image_index]
        image._inline.getparent().remove(image._inline)
        (file_path)
        print("Image deletion succeeded!")
    else:
        print("Image index is out of range!")

# Example: Insert an image into the documentinsert_image('', '')
print("Image insertion succeeded!")

# Example: Resize the second image in the document to 3 inches wide and 4 inches highresize_image('', 1, 3, 4)
print("Image size has been adjusted!")

# Example: Delete the third image in the documentdelete_image('', 2)
print("Image deletion succeeded!")

Hopefully, the above code example will help you understand how to handle tables and images in Word documents.

How, isn't it very useful?

Advanced features and extensions

We can use templates to create custom Word documents, such as report templates. We can also add bookmarks and hyperlinks to add interactivity to the document. In addition, generating directories and directory pages is also easily implemented in Python. Let me use a code example to show you the magic of directory pages:

from docx import Document

document = Document()

# Add a directory pagedocument.add_page_break()

# Generate directorydocument.add_heading("Table of contents", level=1)
document.add_paragraph("1. Introduction")
document.add_paragraph("Two, Method")
document.add_paragraph("Three, Results")

('')

At the end of this article, let me summarize the key points of what I have learned: Word file automation can improve office efficiency and reduce repetitive work; Python as an automation tool, you can choose to use the python-docx library; Using Python, we can read and modify Word files, create and edit Word documents, batch process files, process tables and images, etc.; through advanced functions and extensions, we can use templates, add hyperlinks and directory pages; through actual cases and exercises, you can master practical application skills.

The above is the detailed content of the summary of Word file automation in Python. For more information about Python Word automation, please follow my other related articles!