SoFunction
Updated on 2025-03-04

Based on golang, write a tool to convert word/excel/ppt to pdf

need

The company's customers have a need to convert the doc file to a pdf file and keep the format completely unchanged.

Engineers use various Java class libraries, whether it is doc4j, POI, Libreoffice components or various online API services, and they are not satisfied with the conversion results.

So I took over this job.

Research

In fact, the export function of native Windows Office Word is the most in line with customer needs.

You need to be able to operate Windows' Office Word programs, then you need to be able to directly access its system components, and you need to be similar to the COM/OLE system library, just do it.

1. Operation and maintenance have made a relatively low configuration EC2 machine and Windows 10 system.

2. I found some libraries, python, but there was some problem. It was no problem to run alone. I made a service and did this in a web thread. I found it specifically. It should be a thread problem. I thought about it and stopped doing it (because I don’t want to write it in python, )

3. I quickly found the corresponding OLE library in golang, found one, read the document, and wrote it out directly.

accomplish

Without further ado, just go to the core code to see:

The following is the basic analysis process, which is actually to simulate the following four steps:

1. Open the corresponding program of Office (Word/Excel/PPT)

2. Export as PDF file

3. Close the file

4. Quit the Office program

Basic logic

package office

import (
	ole "/go-ole/go-ole"
	"/go-ole/go-ole/oleutil"
	log "/sirupsen/logrus"
)

/// For more content, please refer to the official COM document /zh-cn/office/vba/api/type Operation struct {
	OpType    string
	Arguments []interface{}
}

/// Some applications do not allow hidden, such as ppt, so Visible needs to be settype ConvertHandler struct {
	FileInPath      string
	FileOutPath     string
	ApplicationName string
	WorkspaceName   string
	Visible         bool
	DisplayAlerts   int
	OpenFileOp      Operation
	ExportOp        Operation
	CloseOp         Operation
	QuitOp          Operation
}

type DomConvertObject struct {
	Application *
	Workspace   *
	SingleFile  *
}

func (handler ConvertHandler) Convert() {
	(0)
	defer ()

	("handle open start")
	dom := ()
	("handle open end")
	("handler in file path is " + )
	("handler out file path is " + )

	defer ()
	defer ()
	defer ()

	(dom)
	("handle export end")

	(dom)
	("handle close end")

	(dom)
	("handle quit end")

}
func (handler ConvertHandler) Open() DomConvertObject {
	var dom DomConvertObject
	unknown, err := ()
	if err != nil {
		panic(err)
	}
	 = (ole.IID_IDispatch)

	(, "Visible", )
	(, "DisplayAlerts", )

	 = (, ).ToIDispatch()

	 = (, , ...).ToIDispatch()
	return dom
}

func (handler ConvertHandler) Export(dom DomConvertObject) {
	(, , ...)

}

func (handler ConvertHandler) Close(dom DomConvertObject) {
	if  == "" {
		(, , ...)
	} else {
		(, , ...)
	}
}

func (handler ConvertHandler) Quit(dom DomConvertObject) {
	(, , ...)

Adaptations for different formats

Support Word/Excel/PPT to pdf. The following is the code for Word to pdf:

package office

func ConvertDoc2Pdf(fileInputPath string, fileOutputPath string) {

	openArgs := []interface{}{fileInputPath}

	/// /zh-cn/office/vba/api/
	exportArgs := []interface{}{fileOutputPath, 17}

	closeArgs := []interface{}{}

	quitArgs := []interface{}{}

	convertHandler := ConvertHandler{
		FileInPath:      fileInputPath,
		FileOutPath:     fileOutputPath,
		ApplicationName: "",
		WorkspaceName:   "Documents",
		Visible:         false,
		DisplayAlerts:   0,
		OpenFileOp: Operation{
			OpType:    "Open",
			Arguments: openArgs,
		},
		ExportOp: Operation{
			OpType:    "ExportAsFixedFormat",
			Arguments: exportArgs,
		},
		CloseOp: Operation{

			OpType:    "Close",
			Arguments: closeArgs,
		},
		QuitOp: Operation{

			OpType:    "Quit",
			Arguments: quitArgs,
		},
	}
	()
}

Provide web service interface

package web

import (
	"encoding/json"
	"fmt"
	"io"
	"io/ioutil"
	"net/http"
	"net/url"
	"office-convert/office"
	"os"
	"path"
	"path/filepath"
	"runtime/debug"
	"strconv"

	log "/sirupsen/logrus"
)

const PORT = 10000
const SAVED_DIR = "files"

type ConvertRequestInfo struct {
	FileInUrl  string `json:"file_in_url"`
	SourceType string `json:"source_type"`
	TargetType string `json:"target_type"`
}

func logStackTrace(err ...interface{}) {
	(err)
	stack := string(())
	(stack)
}

func convertHandler(w , r *) {
	defer func() {
		if r := recover(); r != nil {
			(503)
			(w, r)
			logStackTrace(r)
		}
	}()
	if  != "POST" {
		(400)
		(w, "Method not support")
		return
	}

	var convertRequestInfo ConvertRequestInfo
	reqBody, err := ()
	if err != nil {
		(err)
	}
	(reqBody, &convertRequestInfo)

	(convertRequestInfo)
	()

	downloadFile()

	fileOutAbsPath := getFileOutAbsPath(, )
	convert(convertRequestInfo)

	()
	().Set("Content-Type", "application/octet-stream")
	//If the file is too large, consider using it for streaming copy	outFileBytes, err := (fileOutAbsPath)
	if err != nil {
		panic(err)
	}
	(outFileBytes)

}

func convert(convertRequestInfo ConvertRequestInfo) {

	fileOutAbsPath := getFileOutAbsPath(, )
	switch  {
	case "doc", "docx":
		office.ConvertDoc2Pdf(getFileInAbsPath(), fileOutAbsPath)
		break
	case "xls", "xlsx":
		office.ConvertXsl2Pdf(getFileInAbsPath(), fileOutAbsPath)
		break
	case "ppt", "pptx":
		office.ConvertPPT2Pdf(getFileInAbsPath(), fileOutAbsPath)
		break
	}
}

func getNameFromUrl(inputUrl string) string {
	u, err := (inputUrl)
	if err != nil {
		panic(err)
	}
	return ()
}

func getCurrentWorkDirectory() string {
	cwd, err := ()
	if err != nil {
		panic(err)
	}
	return cwd
}

func getFileInAbsPath(url string) string {
	fileName := getNameFromUrl(url)
	currentWorkDirectory := getCurrentWorkDirectory()
	absPath := (currentWorkDirectory, SAVED_DIR, fileName)
	return absPath
}

func getFileOutAbsPath(fileInUrl string, targetType string) string {
	return getFileInAbsPath(fileInUrl) + "." + targetType
}

func downloadFile(url string) {
	("Start download file url :", url)
	resp, err := (url)
	if err != nil {
		panic(err)
	}
	defer ()

	fileInAbsPath := getFileInAbsPath(url)
	dir := (fileInAbsPath)
	// ("dir is " + dir)
	if _, err := (dir); (err) {
		("dir is not exists")
		(dir, 0644)
	}
	out, err := (fileInAbsPath)
	("save file to " + fileInAbsPath)
	if err != nil {
		panic(err)
	}

	defer ()

	_, err = (out, )
	if err != nil {
		panic(err)
	}

	("Download file end url :", url)
}

func StartServer() {

	("start service ...")
	("/convert", convertHandler)
	("127.0.0.1:"+(PORT), nil)
}

Deployment/Use

Compile (skip)

If you want to compile the source code and get the exe file, you can execute the command go build -ldflags "-H windowsgui" to generate. If you don't want to compile, you can find the corresponding exe file under prebuilt.

run

Method 1: Normal operation

Double-click to execute, but if the program reports an error or the computer shuts down abnormally, it will not restart.

Method 2: Run the background (timed task can be started and can be automatically restored)

It is quite troublesome to start Windows regularly/automatically restore. . .

1. Copy the file

Copy the two files under prebuilt to the C:\Users\Administrator\OfficeConvert\ directory

2. Modify COM access permissions

When we start the program with service and timing tasks, an error will be reported, prompting a null pointer error.

The reason is that Microsoft restricts the use of COM components in non-UI Session (preventing malicious viruses and the like). If you want to allow it, you need to do the following:

Refer to here

  • Open Component Services (Start -> Run, type in dcomcnfg)
  • Drill down to Component Services -> Computers -> My Computer and click on DCOM Config
  • Right-click on Microsoft Excel Application and choose Properties
  • In the Identity tab select This User and enter the ID and password of an interactive user account (domain or local) and click Ok

Note that the picture above is a demonstration. Fill in the Administrator account password of the machine

3. Timing tasks

Create a Windows timing task, call the check_start.bat file every 1 minute, and the file will automatically check whether it is running and start if it does not.

Note: The above picture is just a demonstration. Fill in the specific location C:\Users\Administrator\OfficeConvert\check_start.bat

Web Deployment

Use nginx as the reverse proxy, and the specific location is under C:\Users\Administrator\nginx-1.20.2\nginx-1.20.2, modify the conf/file, and proxy 127.0.0.1:10000.
If you have a public IP (for example), configure DNS to resolve to this machine IP.

server {
        listen       80;
        server_name  ;

        #charset koi8-r;

        #access_log  logs/  main;

        location / {
            root   html;
            index   ;
            proxy_pass http://127.0.0.1:10000;
        }
        # ...Other settings}

ask

Deployed to a Windows machine, access URL: http://127.0.0.1:10000 (If the domain name is configured above, access /convert)

Request related

Method : POST

Content-Type: application/json

Body:

{
    "file_in_url":"https://your_docx_file_url",
    "source_type":"docx",
    "target_type":"pdf"
}
Parameters Is it necessary Value range illustrate
file_in_url yes All kinds of documents that satisfy the following source_type urls Network connection to the document to be converted
source_type yes [doc,docx,xls,xlsx,ppt,pptx] Document Type
target_type yes pdf Only PDF is supported for the time being, and more will be supported in the future

response

Make a judgment based on the HTTP status code

200 : ok

Others: There is something wrong

Body:

Binary stream of converted files

If status_code is not 200, it is the corresponding error message

This is the article about writing a tool for word/excel/ppt to pdf based on golang. For more related go word/excel/ppt to pdf content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!