SoFunction
Updated on 2025-04-13

Common ways to easily read Word document content in Java

Common ways to easily read Word document content in Java

Updated: March 19, 2025 08:47:31 Author: Five Elements Stars
This article mainly introduces common methods to easily read Word document content in Java. For doc format, use the HWPFDocument and WordExtractor classes in the Apache POI library, use the XWPFDocument class for docx format, and extract text content by traversing paragraphs and text running objects. Friends who need it can refer to it

Preface

In Java development, sometimes we have to read the contents of Word documents, which is particularly useful when handling contracts, reports and other files. We can use different libraries to implement the reading function according to the format of the Word document. Let's talk about it in detail below.docand.docxThe reading methods of these two common format documents.

1. Read Word documents in .doc format

Introduce dependencies

If you use Maven to manage your project,Add the dependencies of Apache POI:

<dependency>
    <groupId></groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>5.2.3</version>
</dependency>

Code Example

import ;
import ;

import ;
import ;

public class ReadDocFile {
    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("")) {
            // Create an HWPFDocument object to represent a .doc document            HWPFDocument document = new HWPFDocument(fis);
            // Create WordExtractor object to extract document content            WordExtractor extractor = new WordExtractor(document);
            // Get the text content of the document            String content = ();
            (content);
        } catch (IOException e) {
            ();
            ("Reading .doc file failed:" + ());
        }
    }
}

Code explanation

  • FileInputStream fis = new FileInputStream(""): Create a file input stream to readdocument.

  • HWPFDocument document = new HWPFDocument(fis):useHWPFDocumentThe class creates a document object that can handle.docFormat document.

  • WordExtractor extractor = new WordExtractor(document): CreateWordExtractorObject, it can extract text content from document objects.

  • String content = (): CallgetText()Method to obtain all text content of the document and print it out.

2. Read Word documents in .docx format

Introduce dependencies

Also inAdd the dependencies of Apache POI:

<dependency>
    <groupId></groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>5.2.3</version>
</dependency>

Code Example

import ;
import ;
import ;

import ;
import ;

public class ReadDocxFile {
    public static void main(String[] args) {
        try (FileInputStream fis = new FileInputStream("")) {
            // Create an XWPFDocument object to represent a .docx document            XWPFDocument document = new XWPFDocument(fis);
            StringBuilder content = new StringBuilder();
            // traverse each paragraph in the document            for (XWPFParagraph paragraph : ()) {
                // Iterate through each text run object in the paragraph                for (XWPFRun run : ()) {
                    ((0));
                }
                ("\n");
            }
            (());
        } catch (IOException e) {
            ();
            ("Reading .docx file failed:" + ());
        }
    }
}

Code explanation

  • FileInputStream fis = new FileInputStream(""): Create file input stream readingdocument.

  • XWPFDocument document = new XWPFDocument(fis):useXWPFDocumentClass creates document objects, which are specially processed.docxFormat document.

  • Through two-layer loops, the outer layer traverses each paragraph in the document, the inner layer traverses each text running object in the paragraph, and adds the text content toStringBuilder, finally print it out.

Hey, friends! With the above method, we can easily read Word document content in different formats using Java. Try it now so that your program can also "communicate" with Word documents!

Summarize

This is the introduction to this article about the common methods of easily reading Word document content in Java. For more related Java reading Word document content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!

  • java
  • Read
  • word

Related Articles

  • Java implements simple poker game

    This article mainly introduces the Java implementation of simple poker games. The sample code in the article is introduced in detail and has a certain reference value. Interested friends can refer to it.
    2020-09-09
  • Singleton pattern in Java thread safety

    This article mainly introduces the singleton pattern in Java thread safety. Friends who need it can refer to it.
    2015-02-02
  • Detailed explanation of the Java backend separation of front and backend

    This article mainly introduces the Java backend that explains the separation of front and back ends in detail. The editor thinks it is quite good. I will share it with you now and give you a reference. Let's take a look with the editor
    2017-05-05
  • Detailed explanation of Kotlin modifier lateinit (delay initialization) case

    This article mainly introduces a detailed explanation of the Kotlin modifier lateinit (delay initialization). This article explains the understanding and use of this technology through brief cases. The following is the detailed content. Friends who need it can refer to it.
    2021-09-09
  • Brief analysis of java memory model jvm virtual machine

    The main purpose of the Java memory model is to define access rules for various variables in the program, focusing on the underlying details of storing variable values ​​into memory in the virtual machine and taking out variable values ​​from memory.
    2021-09-09
  • A brief analysis of the dynamic proxy method of Java implementation

    This article mainly introduces a brief analysis of Java's dynamic proxy method. It is very practical. Friends who need it can refer to it.
    2014-08-08
  • Kotlin Basic Tutorials: Object Oriented

    This article mainly introduces object-oriented information about Kotlin's basic tutorial. Friends who need it can refer to it.
    2017-05-05
  • Teach you how to sort Java List function examples in 20 seconds

    This article mainly introduces a detailed explanation of the examples of teaching you to learn List function sorting operations in 20 seconds. Friends in need can refer to it for reference. I hope it can be helpful. I wish you more progress and get promoted as soon as possible to get a salary increase as soon as possible.
    2023-09-09
  • Detailed explanation of the methods and steps of Spring Boot automatic assembly

    This article mainly introduces the detailed explanation of the methods and steps of Spring Boot automatic assembly. The example code is introduced in this article in detail, which has a certain reference learning value for everyone's study or work. Friends who need it, please learn with the editor below.
    2019-06-06
  • Java Mybatis framework from shallow to deep analysis

    MyBatis is an excellent persistence layer framework. It encapsulates the process of jdbc's operating database, so that developers only need to pay attention to SQL itself, without spending energy to deal with the complicated process code of jdbc such as registering drivers, creating connections, creating statements, manually setting parameters, and retrieving results. This article will introduce the use of MyBatis in depth.
    2022-07-07

Latest Comments