SoFunction
Updated on 2025-03-04

Complete implementation steps for converting Word documents (.doc) to .docx format in Python in batches

Preface

In daily office, we often encounter.docand.docxFormat Word files. although.docIt is the format used by the old version of Word, but for compatibility and functional integrity, modern office needs are more inclined to use.docxFormat. This article will introduce how to use Python to automatically batch.docConvert the file in format to.docxFormat, which facilitates us to quickly convert large amounts of files.

1. Environmental preparation

1. Install Python and necessary libraries

First, we need a Python environment and install itcomtypesLibrary for interacting with COM components of Windows (such as Word):

pip install comtypes

2. Make sure Microsoft Word is installed

becausecomtypesThe COM interface of Word is called, so you need to make sure that Microsoft Word (for Windows systems) is installed on the system.

2. Code implementation

1. Import the required library

We first importosand, used for file path operation and interaction with Word, respectively:

import os
import 

2. Specify the folder path

In the code, we specified include.docFolder path to the file. In the example, set the path toD:\1, please replace the path you actually store the file as needed:

folder_path = r'D:\1'

3. Write conversion functions

Functionconvert_doc_to_docx(input_doc_path)Will single.docConvert file to.docxdocument:

  • use('')Create a Word application instance.
  • Open.docFile and specify to save as.docxFormat.
  • Close documents and Word apps and free up resources.
def convert_doc_to_docx(input_doc_path):
    word = ('')
     = False
    doc = (input_doc_path)
    output_docx_path = input_doc_path.replace('.doc', '.docx')  # Generate output path    (output_docx_path, FileFormat=16)  #16 means docx format    ()
    ()
    return output_docx_path  # Return the newly generated .docx file path

4. Batch conversion of files

Next, we traverse the folder.docFile, callconvert_doc_to_docxFunctions are converted one by one. After each conversion is completed, the file name and path that was successfully converted will be output.

# Iterate through all .doc files in the folder and convert them to .docxfor filename in (folder_path):
    if ('.doc'):
        doc_path = (folder_path, filename)
        docx_path = convert_doc_to_docx(doc_path)  # Use different variable names to receive the return value        print(f"{filename} Converted successfully to {docx_path}")

print("All files have been converted!")

3. Complete code

The following is to.docBatch conversion of files to.docxThe complete code of  including comments for easy understanding:

import os
import 

# Specify the folder pathfolder_path = r'D:\1'

# Function converts .doc to .docxdef convert_doc_to_docx(input_doc_path):
    word = ('')
     = False  # Set Word invisible    doc = (input_doc_path)  # Open the .doc file    output_docx_path = input_doc_path.replace('.doc', '.docx')  # Set the output file name    (output_docx_path, FileFormat=16)  # Save as .docx format    ()  # Close the document    ()  # Exit Word app    return output_docx_path  # Return the newly generated .docx path
# Iterate through all .doc files in the folder and convert them to .docxfor filename in (folder_path):
    if ('.doc'):
        doc_path = (folder_path, filename)
        docx_path = convert_doc_to_docx(doc_path)  # Call the conversion function        print(f"{filename} Converted successfully to {docx_path}")

print("All files have been converted!")

4. Code description

  • File traversal and judgment

    • (folder_path)Travel all files in the folder.
    • if ('.doc')Make sure to handle only.docFiles, avoid mishandling other file formats.
  • Word app is not visible

    • = FalseHide Word window to avoid popping up and affecting user operations.
  • File path replacement

    • output_docx_path = input_doc_path.replace('.doc', '.docx')Replace the extension of the original file path from.docChange to.docx, generate a new save path.
  • File format settings

    • FileFormat=16Specify to save as.docxFormat.FileFormat Parameter value16Corresponding.docxFile type.
  • Resource release

    • ()Close the current document,()Exit the Word application to ensure that resources are not occupied.
  • Output result

    • After each successful conversion, useprint()Displays the file name that has been converted to facilitate tracking progress.

5. Things to note

  • Make sure Microsoft Word is installed on the system: The code depends on Word's COM components, and Word needs to be installed in the system to run normally.
  • File format and path: Please confirm.docThe path to the file to avoid specifying the wrong folder.
  • Resource Management: When converting a large number of files, make sure()Being executed to prevent the Word process from occupying system resources.
  • Running environment: This code is suitable for Windows systems because it relies on COM components to interact with Word.

6. Summary

By using Python withcomtypeslibrary, we can implement batch.docConvert file to.docxDocument requirements. This method not only saves time, but also effectively solves the inefficiency and error rate problems caused by manual operation. I hope this article can help you in your daily office work!

This is the article about converting Word documents (.doc) into .docx format in Python batch conversion. For more related content related to Python batch conversion docx format, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!