SoFunction
Updated on 2025-02-28

Complete implementation steps for parsing excel files using JS

Preface

Let's talk about how to use it todayJSLet's analyzeexcelFiles, of course, are not used directlyexceljssheetjsSuch libraries are boring, but mainly talk about themJSAnalysisexcelHow tables are implemented.

Pay attention to the main discussion in this articlexlsxFormatexcelThe table, other formats are not clearly explored.

What is an excel table file

First, parseexcelFiles, you have to first understand how it stores data. After my searches, I finally found itGGThe answer was found in:excelThe file is actually azipBag! So I quickly built a new onexlsxTwo new files were created in themsheetTable, twosheetThe table data is as follows:

This issheet 1:

A B C
1   2
1   2
     
     
     
1   2
1   2

This issheet 2:

A B
q a
q a
q a

Then usezipDecompress:

unzip  -d test

Then passtreeWe'll get a directory structure like this:

test
├── [Content_Types].xml
├── _rels
├── docProps
│   ├──
│   ├──
│   └──
└── xl
    ├── _rels
    │   └──
    ├──
    ├──
    ├── theme
    │   └──
    ├──
    └── worksheets
        ├──
        └──

Ahh, beautifully done, it's allxmldocument.

We're openingxmlAfter exploring the details, you can see that there are several documents that are very conspicuous, that isworksheetsNextand,besides, othersstylesthemeIt's obviously related to the style._relsIt feels like an internal reference. Let's take a look at two first.sheetofxmlFile, see if the guess is correct, post it

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="/spreadsheetml/2006/main"
    xmlns:r="/officeDocument/2006/relationships"
    xmlns:xdr="/drawingml/2006/spreadsheetDrawing"
    xmlns:x14="/office/spreadsheetml/2009/9/main"
    xmlns:mc="/markup-compatibility/2006"
    xmlns:etc="/officeDocument/2017/etCustomData">
    <sheetPr/>
    <dimension ref="A1:C7"/>
    <sheetViews>
        <sheetView workbookViewId="0">
            <selection activeCell="D5" sqref="A3:D5"/>
        </sheetView>
    </sheetViews>
    <sheetFormatPr defaultColWidth="9.23076923076923" defaultRowHeight="16.8" outlineLevelRow="6" outlineLevelCol="2"/>
    <sheetData>
        <row r="1" spans="1:3">
            <c r="A1">
                <v>1</v>
            </c>
            <c r="C1">
                <v>2</v>
            </c>
        </row>
        <row r="2" spans="1:3">
            <c r="A2">
                <v>1</v>
            </c>
            <c r="C2">
                <v>2</v>
            </c>
        </row>
        <row r="6" spans="1:3">
            <c r="A6">
                <v>1</v>
            </c>
            <c r="C6">
                <v>2</v>
            </c>
        </row>
        <row r="7" spans="1:3">
            <c r="A7">
                <v>1</v>
            </c>
            <c r="C7">
                <v>2</v>
            </c>
        </row>
    </sheetData>
    <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
    <headerFooter/>
</worksheet>

😂 I believe everyone has seen it.sheetDatathat isexcelThe data in the table is here,<row>Represents row,rIt is the index of the row number,rowIn-house<c>It should becellIt's one of them<v>Correspondingcellvalue inrYescelllocation, such asA7Represents inAColumn 7 rows.

There are also several obvious properties such asdimensionIt can be seen that it is the size range of the table, fromA1 cellarriveC7 cellForm a box.<sheetViews>The information stored in the page should be the information on the page.<selection>What represents the selected form content.

andworkbookThe stored in thesheetInformation:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="/spreadsheetml/2006/main"
    xmlns:r="/officeDocument/2006/relationships">
    <fileVersion appName="xl" lastEdited="3" lowestEdited="5" rupBuild="9302"/>
    <workbookPr/>
    <bookViews>
        <workbookView windowHeight="16360" activeTab="1"/>
    </bookViews>
    <sheets>
        <sheet name="Sheet1" sheetId="1" r:/>
        <sheet name="Sheet2" sheetId="2" r:/>
    </sheets>
    <calcPr calcId="144525"/>
</workbook>

The remaining fewxml, after a look, the stored information is quite clear, such as:

  • appThe file program information is stored, and it seems that the file name is also
  • coreThe author's information and creation and modification time are saved
  • relsSo are the filesxmlFormat, some other storagexmlQuotation
  • themeThe colors and fonts defined in the table are stored
  • [Content_Types]There are references to all files, guesses are estimated to be parsed entry files

JS implementation steps

knewexcelHow do files store data, how do we use itjsIt will be clear to parse it, and it is mainly divided into three steps:

  • usejsDecompressexceldocument
  • Obtain the one in itsheetFile contents, thenxmlData parsed
  • Convert data to the shape we want

Let's do it right away, let's do it:

ZIP Decompression

aboutJSHow to implement itZIPUnzipped, as mentioned in the previous article, we will not explain it in detail here, just use it directlyjszipGet it done:

('#file').addEventListener('change', async e => {
    const file = [0];
    if (!file) return;
    const zip = await (file);
    const sheetXML = await ['xl/worksheets/'].async('string');
});

Get it done quickly nowsheetXMLThat's what we just sawThe data in it is already there.

XML parsing

Then we can analyzeXMLThe content takes out the data,xmlThe analysis principle is very simple, andhtml parseSimilarly, if you understand the principle, we can just create an open source library to help you solve it:

import convert from 'xml-js';

const result = convert.xml2json(sheetXML, { compact: true, spaces: 4 });

Then we got this stringJSON(Some content has been deleted):

{
    "_declaration": {
        "_attributes": {}
    },
    "worksheet": {
        "_attributes": {},
        "sheetPr": {},
        "dimension": {
            "_attributes": {
                "ref": "A1:C7"
            }
        },
        "sheetData": {
            "row": [
                {
                    "_attributes": {
                        "r": "1",
                        "spans": "1:3"
                    },
                    "c": [
                        {
                            "_attributes": {
                                "r": "A1"
                            },
                            "v": {
                                "_text": "1"
                            }
                        },
                        {
                            "_attributes": {
                                "r": "C1"
                            },
                            "v": {
                                "_text": "2"
                            }
                        }
                    ]
                },
                {
                    "_attributes": {
                        "r": "7",
                        "spans": "1:3"
                    },
                    "c": [
                        {
                            "_attributes": {
                                "r": "A7"
                            },
                            "v": {
                                "_text": "1"
                            }
                        },
                        {
                            "_attributes": {
                                "r": "C7"
                            },
                            "v": {
                                "_text": "2"
                            }
                        }
                    ]
                }
            ]
        }
    }
}

Next, we just need tosheetDataTake out the data in it, and then generate the data format you want according to the internal properties.

Summarize

excelThe essence of a file is onezipWe only need to passzipUnzip,xmlThe three steps of parsing and data processing can be usedJSThere are still many details to read the data, but if it is just simpleexcelTemplate, why not try it yourself.

This is the end of this article about parsing excel files using JS. For more related JS parsing excel files, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!