SoFunction
Updated on 2025-03-08

Java implements reading and processing PDF files based on PDFbox

Preface

Hi, Hello everyone, the 2022 Spring Festival is coming to an end, and construction is being started one after another in various places. Recently, a friend did a small project and happened to use Java to read PDF file information. So record the relevant process.

PDFbox introduction

PDFbox is an open source, Java-based, and supports PDF document generation tool library. It can be used to create new PDF documents, modify existing PDF documents, and extract the required content from PDF documents. Apache PDFBox also includes several command line tools.

A collection of basic objects when a PDF file is data: arrays, booleans, dictionaries, numbers, strings, and binary streams.

Development Environment

This Java version information for processing PDF files based on PDFbox is as follows:

JDK1.8

SpringBoot 2.3.

PDFbox 1.8.13

PDFbox dependencies

When using PDFbox for the first time, you need to introduce PDFbox dependencies. The dependency packages used this time are as follows:

<dependency>
            <groupId></groupId>
            <artifactId>pdfbox</artifactId>
            <version>1.8.13</version>
        </dependency>

Start quickly

This example reads out the information in the PDF file in the specified directory and stores it in the txt text file with a new specified path.

class PdfTest {

    public static void main(String[] args) throws Exception {
       String filePath ="C:\\Users\\Admin\\Desktop\\";
   
        List&lt;String&gt; list = getFiles(basePath);
        for (String filePath : list) {
            long ltime = ();
            String substring = (("\\") + 1, ("."));
            String project = "()";
            String textFromPdf = getTextFromPdf(filePath);
            String s = writterTxt(textFromPdf, substring + "--", ltime, basePath);
            StringBuffer stringBuffer = readerText(s, project);
            writterTxt((), substring + "-", ltime, basePath);
        }
        ("******************** end ************************");
    }

    public static List&lt;String&gt; getFiles(String path) {
        List&lt;String&gt; files = new ArrayList&lt;String&gt;();
        File file = new File(path);
        File[] tempList = ();

        for (int i = 0; i &lt; ; i++) {
            if (tempList[i].isFile()) {
                if (tempList[i].toString().contains(".pdf") || tempList[i].toString().contains(".PDF")) {
                    (tempList[i].toString());
                }
                //File name, not including path                //String fileName = tempList[i].getName();
            }
            if (tempList[i].isDirectory()) {
                //I won't recurse here.            }
        }
        return files;
    }

    public static String getTextFromPdf(String filePath) throws Exception {
        String result = null;
        FileInputStream is = null;
        PDDocument document = null;
        try {
            is = new FileInputStream(filePath);
            PDFParser parser = new PDFParser(is);
            ();
            document = ();
            PDFTextStripper stripper = new PDFTextStripper();
            result = (document);
        } catch (FileNotFoundException e) {
            ();
        } catch (IOException e) {
            ();
        } finally {
            if (is != null) {
                try {
                    ();
                } catch (IOException e) {
                    ();
                }
            }
            if (document != null) {
                try {
                    ();
                } catch (IOException e) {
                    ();
                }
            }
        }
        Map&lt;String, String&gt; map = new HashMap&lt;String, String&gt;();
        return result;
    }


    public static String writterTxt(String data, String text, long l, String basePath) {
        String fileName = null;
        try {
            if (text == null) {
                fileName = basePath + "javaio-" + l + ".txt";
            } else {
                fileName = basePath + text + l + ".txt";
            }

            File file = new File(fileName);
            //if file doesnt exists, then create it
            if (!()) {
                ();
            }
            //true = append file
            OutputStream outputStream = new FileOutputStream(file);
//            FileWriter fileWritter = new FileWriter((), true);
//            (data);
//            ();
            OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream);
            (data);
            ();
            ();
            ("Done");
        } catch (IOException e) {
            ();
        }

        return fileName;
    }

    public static StringBuffer readerText(String name, String project) {
        // Use ArrayList to store the strings read by each line        StringBuffer stringBuffer = new StringBuffer();
        try {
            FileReader fr = new FileReader(name);
            BufferedReader bf = new BufferedReader(fr);
            String str;
            // Read string by line            while ((str = ()) != null) {
                str = replaceAll(str);
                if (("D、") || ("D.")) {
                    (str);
                    ("\n");
                    ("Reference: \n");
                    ("Reference: \n");
                    ("\n\n\n\n");
                } else if (("A、") || ("A.")) {
                    (() - 1);
                    ("。" + project + "\n");
                    (str + "\n");
                } else if (("B、") || ("C、") || ("B.") || ("C.")) {
                    (str + "\n");
                } else {
                    (str);
                }

            }
            ();
            ();
        } catch (IOException e) {
            ();
        }
        return stringBuffer;
    }

    public static String replaceAll(String str) {
        return ("net", "");
    }
}

Conclusion

Okay, the above is an introduction to the concepts related to inheritance in Java. Thank you for your reading. I hope you like it. If there are any shortcomings, please comment and correct me.

This is the article about Java reading and processing PDF files based on PDFbox. For more related Java reading and processing PDF content, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!