need
The company's customers have a need to convert the doc file to a pdf file and keep the format completely unchanged.
Engineers use various Java class libraries, whether it is doc4j, POI, Libreoffice components or various online API services, and they are not satisfied with the conversion results.
So I took over this job.
Research
In fact, the export function of native Windows Office Word is the most in line with customer needs.
You need to be able to operate Windows' Office Word programs, then you need to be able to directly access its system components, and you need to be similar to the COM/OLE system library, just do it.
1. Operation and maintenance have made a relatively low configuration EC2 machine and Windows 10 system.
2. I found some libraries, python, but there was some problem. It was no problem to run alone. I made a service and did this in a web thread. I found it specifically. It should be a thread problem. I thought about it and stopped doing it (because I don’t want to write it in python, )
3. I quickly found the corresponding OLE library in golang, found one, read the document, and wrote it out directly.
accomplish
Without further ado, just go to the core code to see:
The following is the basic analysis process, which is actually to simulate the following four steps:
1. Open the corresponding program of Office (Word/Excel/PPT)
2. Export as PDF file
3. Close the file
4. Quit the Office program
Basic logic
package office import ( ole "/go-ole/go-ole" "/go-ole/go-ole/oleutil" log "/sirupsen/logrus" ) /// For more content, please refer to the official COM document /zh-cn/office/vba/api/type Operation struct { OpType string Arguments []interface{} } /// Some applications do not allow hidden, such as ppt, so Visible needs to be settype ConvertHandler struct { FileInPath string FileOutPath string ApplicationName string WorkspaceName string Visible bool DisplayAlerts int OpenFileOp Operation ExportOp Operation CloseOp Operation QuitOp Operation } type DomConvertObject struct { Application * Workspace * SingleFile * } func (handler ConvertHandler) Convert() { (0) defer () ("handle open start") dom := () ("handle open end") ("handler in file path is " + ) ("handler out file path is " + ) defer () defer () defer () (dom) ("handle export end") (dom) ("handle close end") (dom) ("handle quit end") } func (handler ConvertHandler) Open() DomConvertObject { var dom DomConvertObject unknown, err := () if err != nil { panic(err) } = (ole.IID_IDispatch) (, "Visible", ) (, "DisplayAlerts", ) = (, ).ToIDispatch() = (, , ...).ToIDispatch() return dom } func (handler ConvertHandler) Export(dom DomConvertObject) { (, , ...) } func (handler ConvertHandler) Close(dom DomConvertObject) { if == "" { (, , ...) } else { (, , ...) } } func (handler ConvertHandler) Quit(dom DomConvertObject) { (, , ...)
Adaptations for different formats
Support Word/Excel/PPT to pdf. The following is the code for Word to pdf:
package office func ConvertDoc2Pdf(fileInputPath string, fileOutputPath string) { openArgs := []interface{}{fileInputPath} /// /zh-cn/office/vba/api/ exportArgs := []interface{}{fileOutputPath, 17} closeArgs := []interface{}{} quitArgs := []interface{}{} convertHandler := ConvertHandler{ FileInPath: fileInputPath, FileOutPath: fileOutputPath, ApplicationName: "", WorkspaceName: "Documents", Visible: false, DisplayAlerts: 0, OpenFileOp: Operation{ OpType: "Open", Arguments: openArgs, }, ExportOp: Operation{ OpType: "ExportAsFixedFormat", Arguments: exportArgs, }, CloseOp: Operation{ OpType: "Close", Arguments: closeArgs, }, QuitOp: Operation{ OpType: "Quit", Arguments: quitArgs, }, } () }
Provide web service interface
package web import ( "encoding/json" "fmt" "io" "io/ioutil" "net/http" "net/url" "office-convert/office" "os" "path" "path/filepath" "runtime/debug" "strconv" log "/sirupsen/logrus" ) const PORT = 10000 const SAVED_DIR = "files" type ConvertRequestInfo struct { FileInUrl string `json:"file_in_url"` SourceType string `json:"source_type"` TargetType string `json:"target_type"` } func logStackTrace(err ...interface{}) { (err) stack := string(()) (stack) } func convertHandler(w , r *) { defer func() { if r := recover(); r != nil { (503) (w, r) logStackTrace(r) } }() if != "POST" { (400) (w, "Method not support") return } var convertRequestInfo ConvertRequestInfo reqBody, err := () if err != nil { (err) } (reqBody, &convertRequestInfo) (convertRequestInfo) () downloadFile() fileOutAbsPath := getFileOutAbsPath(, ) convert(convertRequestInfo) () ().Set("Content-Type", "application/octet-stream") //If the file is too large, consider using it for streaming copy outFileBytes, err := (fileOutAbsPath) if err != nil { panic(err) } (outFileBytes) } func convert(convertRequestInfo ConvertRequestInfo) { fileOutAbsPath := getFileOutAbsPath(, ) switch { case "doc", "docx": office.ConvertDoc2Pdf(getFileInAbsPath(), fileOutAbsPath) break case "xls", "xlsx": office.ConvertXsl2Pdf(getFileInAbsPath(), fileOutAbsPath) break case "ppt", "pptx": office.ConvertPPT2Pdf(getFileInAbsPath(), fileOutAbsPath) break } } func getNameFromUrl(inputUrl string) string { u, err := (inputUrl) if err != nil { panic(err) } return () } func getCurrentWorkDirectory() string { cwd, err := () if err != nil { panic(err) } return cwd } func getFileInAbsPath(url string) string { fileName := getNameFromUrl(url) currentWorkDirectory := getCurrentWorkDirectory() absPath := (currentWorkDirectory, SAVED_DIR, fileName) return absPath } func getFileOutAbsPath(fileInUrl string, targetType string) string { return getFileInAbsPath(fileInUrl) + "." + targetType } func downloadFile(url string) { ("Start download file url :", url) resp, err := (url) if err != nil { panic(err) } defer () fileInAbsPath := getFileInAbsPath(url) dir := (fileInAbsPath) // ("dir is " + dir) if _, err := (dir); (err) { ("dir is not exists") (dir, 0644) } out, err := (fileInAbsPath) ("save file to " + fileInAbsPath) if err != nil { panic(err) } defer () _, err = (out, ) if err != nil { panic(err) } ("Download file end url :", url) } func StartServer() { ("start service ...") ("/convert", convertHandler) ("127.0.0.1:"+(PORT), nil) }
Deployment/Use
Compile (skip)
If you want to compile the source code and get the exe file, you can execute the command go build -ldflags "-H windowsgui" to generate. If you don't want to compile, you can find the corresponding exe file under prebuilt.
run
Method 1: Normal operation
Double-click to execute, but if the program reports an error or the computer shuts down abnormally, it will not restart.
Method 2: Run the background (timed task can be started and can be automatically restored)
It is quite troublesome to start Windows regularly/automatically restore. . .
1. Copy the file
Copy the two files under prebuilt to the C:\Users\Administrator\OfficeConvert\ directory
2. Modify COM access permissions
When we start the program with service and timing tasks, an error will be reported, prompting a null pointer error.
The reason is that Microsoft restricts the use of COM components in non-UI Session (preventing malicious viruses and the like). If you want to allow it, you need to do the following:
Refer to here
- Open Component Services (Start -> Run, type in dcomcnfg)
- Drill down to Component Services -> Computers -> My Computer and click on DCOM Config
- Right-click on Microsoft Excel Application and choose Properties
- In the Identity tab select This User and enter the ID and password of an interactive user account (domain or local) and click Ok
Note that the picture above is a demonstration. Fill in the Administrator account password of the machine
3. Timing tasks
Create a Windows timing task, call the check_start.bat file every 1 minute, and the file will automatically check whether it is running and start if it does not.
Note: The above picture is just a demonstration. Fill in the specific location C:\Users\Administrator\OfficeConvert\check_start.bat
Web Deployment
Use nginx as the reverse proxy, and the specific location is under C:\Users\Administrator\nginx-1.20.2\nginx-1.20.2, modify the conf/file, and proxy 127.0.0.1:10000.
If you have a public IP (for example), configure DNS to resolve to this machine IP.
server { listen 80; server_name ; #charset koi8-r; #access_log logs/ main; location / { root html; index ; proxy_pass http://127.0.0.1:10000; } # ...Other settings}
ask
Deployed to a Windows machine, access URL: http://127.0.0.1:10000 (If the domain name is configured above, access /convert)
Request related
Method : POST
Content-Type: application/json
Body:
{ "file_in_url":"https://your_docx_file_url", "source_type":"docx", "target_type":"pdf" }
Parameters | Is it necessary | Value range | illustrate |
---|---|---|---|
file_in_url | yes | All kinds of documents that satisfy the following source_type urls | Network connection to the document to be converted |
source_type | yes | [doc,docx,xls,xlsx,ppt,pptx] | Document Type |
target_type | yes | Only PDF is supported for the time being, and more will be supported in the future |
response
Make a judgment based on the HTTP status code
200 : ok
Others: There is something wrong
Body:
Binary stream of converted files
If status_code is not 200, it is the corresponding error message
This is the article about writing a tool for word/excel/ppt to pdf based on golang. For more related go word/excel/ppt to pdf content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!