SoFunction
Updated on 2025-03-04

Use Python to automatically write word documents

Preface

There are a lot of report writing needs in my work. After constantly copying and pasting, I suddenly thought, is there any way to implement programming in this highly repetitive work? After searching for relevant content, I found that a keyword is called RPA (robot process automation. If you search for this keyword, you will find that it is exactly the same as the requirements I want to achieve. However, this word is generally mentioned in environments such as finance). So can python realize RPA? Continue searching and found that there are many packages. Here is a introduction to python-docx, a package used to generate word documents. Official documentation linkpython-docx

Install python-docx

You can install it using pip. If the download speed is slow, you need to replace it with a domestic mirror source:

pip install python-docx

python-docx use

Create a word document

from docx import Document
document = Document()
("Report.docx")

Using Document() to complete the creation of a word document. The variable I named it document. This step is equivalent to creating an empty word document with the right mouse button in the folder.

Document is very important, equivalent to a large pool that has not yet been filled with water. All the contents we insert must be poured into this pool. Professionally speaking, document is a newly created object, and all operations must use this object (a point behind the document, plus the specific method to be called). After the document is completed, remember to call the save() method in the last step to save the document. It can be a relative path or an absolute path. If you use a relative path, the path where the program is located is the root directory.

All the following codes should be placed between document = Document() and ("report.docx") and will not be repeated.

Set the paper orientation, size, and margins

If you are familiar with word operations, you must know the section breaking characters. The header, footer, margin, etc. in each section are unified. There is a section break character in the newly created document by default. If you want to set the paper direction and page margin of the first section, you need to obtain the section break character object. In the following code, section is the obtained section break object. If there are several section breaks, 0 represents the first one, and so on.

import 
from  import WD_ORIENTATION
"""Get the first section break"""
section = [0]
"""Set landscape"""
 = WD_ORIENTATION.LANDSCAPE
# Settings pagepage_h, page_w = section.page_width, section.page_height  # Reverse the height and width# Set the width of horizontal papersection.page_width = page_w
# Set the height of horizontal papersection.page_height = page_h
# Set up the upper, lower, left and right marginssection.left_margin = (2)
section.right_margin = (2)
section.top_margin = (2)
section.bottom_margin = (2)

Word documents have landscape and portrait orientation. Python-docx sets landscape code = WD_ORIENTATION.LANDSCAPE. If you set the landscape to change to = WD_ORIENTATION.PORTRAIT, the default is portrait, so generally only landscape is required to implement the code.

If you do not set the height and width of the paper, you will find that the open word document seems to be "portraited". When printing and checking the layout - the paper direction, you will find that it is indeed horizontal. The word document recognized by the computer seems inconsistent with us. In order to unify the appearance and paper direction, we obtain the height and width of the paper, and then set the height to width and width to height.

Of course, if you only need vertical documentation, then none of the above steps are needed.

Set the upper, lower, left and right margins of this section and set different properties of the section respectively. For various distance units, python-docx uses "pounds" by default, that is, section.left_margin = 2, which will set the left margin to 2 pounds. I am still used to centimeters as units. However, in this case, you need to convert centimeters to pounds through (). The margins in the above code are set to 2 cm and convert them into pounds through the conversion function.

If you want to add a new section break:

from  import WD_SECTION_START
section_new = document.add_section(start_type=WD_SECTION_START.NEW_PAGE)

The function names of python-docx are easy to understand. If the above code is not explained, you can understand that a new section has been added to the method calling document. The type WD_SECTION_START can be selected for the next section of NEW_PAGE, or the continuous equal section breaking characters can be selected.

Unified formatting

The chart text and other content added by python-docx can be modified after "add". However, in this case, after each add of the text, you have to set the line spacing, font, indentation, etc., which is too cumbersome. Python-docx can set the styles the same as Word. After adding the text, apply the style to the text. You can create a new style or modify the existing style.

"""Create body style"""
from  import qn
from  import WD_STYLE_TYPE
from  import Pt, Cm
style_normal = .add_style('NORMAL STYLE', WD_STYLE_TYPE.PARAGRAPH)
style_normal.base_style = ['Normal']  # Basic stylestyle_normal. = 'Times New Roman'  # English fontstyle_normal.((qn('w:eastAsia')), 'Song style')  # Chinese fontsstyle_normal.paragraph_format.space_before = Pt(0) #Mouth every daystyle_normal.paragraph_format.space_after = Pt(0) #Sectionstyle_normal. = Pt(14) # Font numberstyle_normal.paragraph_format.line_spacing = Pt(28) # Line spacingstyle_normal.paragraph_format.first_line_indent = Pt(28)  # First line indentation

The above code creates a new style, which I named NORMAL STYLE, which inherits from the basic style, and then sets its own font, front and back segments, line spacing indentation, etc. Python-docx does not have the unit of "character", so I think that the first line indents two characters can only be calculated by myself. One character is 14 pounds, and the first line indents 28 pounds is two characters.

['Normal']. = 'Times New Roman'  # English font['Normal'].((qn('w:eastAsia')), 'Song style')  # Chinese fonts['Normal'].paragraph_format.space_before = Pt(0)
['Normal'].paragraph_format.space_after = Pt(0)
['Normal']. = Pt(14)
['Normal'].paragraph_format.line_spacing = Pt(28)
['Normal'].paragraph_format.first_line_indent = Pt(28) 

The above code has modified the existing style, and the basic Normal style is modified here. After that, all the text formats of the add code will be in accordance with the modified Normal style.

If you create a new style, you have to apply the style after each add. If you modify the Normal style, you don’t need to apply the style after add. If you modify other styles, you also need to apply the style after add, because by default, it is displayed in the Normal format. Although it does not require setting the style of the text every time to modify the Normal style, the author discovered a problem after trying. If there is a table in the word document to insert text, no matter how you set the style of the text in the table, it will still be set to Normal style. It seems that Normal has a high priority. It is not that the text in the table does not have this problem. Other styles can be applied normally. Therefore, if there are tables in the word document, it is recommended to create a new style and do not use the method of modifying the Normal style.

Word comes with many styles, but if you don’t need to, don’t inherit some styles you haven’t heard of before. For example, a certain style comes with underscores and cannot be removed no matter how you set it in the code. This will happen.

Insert text

Word documents have many paragraphs (paragraphs), and python-docx also has this concept. Then, in the concept of python-docx, there are many "run" objects in each paragraph.

Adding a paragraph of text can set the overall line spacing of the paragraph, indentation of the first line, etc., add a run to the paragraph, and set the format of the text in this run, so that there are different font formats in a paragraph of text.

paragraph = document.add_paragraph("Test paragraph")
 = "NORMAL STYLE"

It is very simple to add a paragraph of text. After adding it, you can apply the style you added before. Each time you call add_paragraph, it is equivalent to pressing the Enter key once, and then the text content is. As mentioned above, if a paragraph of text needs to be set in different formats, you can add multiple runs.

paragraph = document.add_paragraph(style="NORMAL STYLE")
run = paragraph.add_run("Dear")
 = "Bold"
run = paragraph.add_run("Ms. XX/Mr.")
 = "Songyi"

Similar to the setting style, paragraphs and run can set the font size and other properties inside to modify the details.

Insert a table

table = document.add_table(rows=5, cols=5, style='NORMAL STYLE')
(0, 0).text = 'test'
[1].cells[0].merge([1].cells[1])

Adding a table is very simple. When creating a new table, you can set the row and column parameters and set the style of the text in the table. As mentioned earlier, if you modify the Normal style that comes with Word, the setting style here will be invalid. You can get the cells of specific rows and columns, and call .text to modify the content. Use merge to merge cells. The above code means combining the 2nd row column 1 cell and 2nd row column 2 cell (python count starts at 0). If multiple cells are merged, they can only be performed one by one. I usually write a loop implementation. If the original cell corresponding to the merged cell has multiple filled text, it will only retain the contents of the first cell, similar to Excel.

Insert picture

Adding pictures is also simple and crude. Add_picture is done. Similar to setting the attributes of text, we can set some properties of the picture. Generally, the setting of the picture is mainly to set the size. I got the height and width of the picture (the default unit is pound), and then converted the pound to centimeters and set the picture to the height width of the unit of centimeters. You can replace the file address of the image by yourself.

from  import WD_PARAGRAPH_ALIGNMENT
run = paragraph.add_run()
pic = run.add_picture(figurepath)
original_width, original_height = , 
change_ratio = (7/2.54*914400) / original_height
scaled_width = int(original_width * change_ratio)
scaled_height = int(original_height * change_ratio)  # Zoom to 7cm = scaled_width      # Zoom = scaled_height    # Zoom = WD_PARAGRAPH_ALIGNMENT.CENTER

Conclusion

Python-docx is very convenient to use, but it also needs to be noted that it is mainly used to generate new word documents according to the code (although you can also read existing word documents, but the function is weak). If you already have some word documents, if you want to read word documents and modify them in the specified location, you need to use lower-level pywin32 and other packages to achieve the requirements. python-docx is essentially creating a new file, but this file is arranged according to the specifications of the word document. If your computer does not have a word program installed, the word document generated by python-docx can still be executed normally. It only needs to change to a computer installed with word to open and read normally. However, packages such as pywin32 need to load existing word documents by running the word program. You must install the word program to achieve it. In essence, the program opens word instead of you. The advantage is that the formatting arrangement that can be performed will be more refined than python-docx, but the code will be more underlying and complex.

Some people may ask, isn't python-docx inserting text, pictures, tables, etc. very similar to word templates? If you look at the python-docx package alone, the implemented functions will be a bit similar, but python-docx can be combined with other toolkits to calculate a large amount of data, draw a chart with word documents. Such powerful functions are completely incomparable to word templates. Just click the mouse and generate reports of dozens of pages in one click. Only when you use them in your actual work can you truly understand O(∩_∩)O.

This is the end of this article about using Python to automatically write word documents. For more related Python automatic writing word content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!