Preface
A few days ago, I encountered a need to parse and process XML files while working. At that time, considering that the logic was relatively complicated, I slowly used Java. However, this requirement often changes. After each change, the code of the jar package must be found again. After the modification, the original jar package must be replaced. First, it is inconvenient to modify, second, it is inconvenient to save the code in a unified manner, and third, it is inconvenient to view the functions of the jar package.
In fact, for this relatively flexible function, the most convenient and efficient way is to use some scripting languages, such as python, ruby, etc., which are highly developed and can also handle some complex logic. However, for various reasons, some machines at work do not have interpreters for these languages installed. Therefore, it was a last resort to study a wave of methods for parsing XML using shell scripts.
After all, shells are not suitable for dealing with complex logic, but for some simple search and replacement requirements, it is quite convenient to use shells.
I mainly use the following three tools:
- xmllint
- xpath
- xml2
The following are the usages of these three tools for easy reference later.
xmllint
Brief description
xmllint is actually a gadget implemented by a C language library function called libxml2. Therefore, it is relatively efficient, has good support for different systems, and has relatively complete functions. It generally belongs to the libxml2-utils software package, so it is similar tosudo apt install libxml2-utils
The command can be installed.
Function
xmllint supports at least the following commonly used functions:
- Support xpath query statements
- Supports interactive query of shell class
- Support xml format verification
- Supports verification of dtd and xsd on xml
- Support encoding conversion
- Support xml formatting
- Supports despace compression
- Support time efficiency statistics
In fact, the most commonly used functions are three - xpath query, space removal, formatting, and verification.
For example, currently there are:
<books> <book > <name>book1</name> <price>100</price> </book> <book > <name>book2</name> <price>200</price> </book> <book ><name>book3</name><price>300</price> </book> </books>
Execute xpath query:
myths@business:~$ xmllint --xpath "//book[@id=2]/name/text()" book2
Remove the space:
myths@business:~$ xmllint --noblanks <?xml version="1.0"?> <books><book ><name>book1</name><price>100</price><license/></book><book ><name>book2</name><price>200</price></book><book ><name>book3</name><price>300</price></book></books>
format:
myths@business:~$ xmllint --format <?xml version="1.0"?> <books> <book > <name>book1</name> <price>100</price> <license/> </book> <book > <name>book2</name> <price>200</price> </book> <book > <name>book3</name> <price>300</price> </book> </books>
xsd verification:
myths@business:~$ cat <?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns="" xmlns:xs="http:///2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="books" msdata:IsDataSet="true" msdata:Locale="en-US"> <xs:complexType> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string" minOccurs="0" msdata:Ordinal="0" /> <xs:element name="price" type="xs:string" minOccurs="0" msdata:Ordinal="1" /> </xs:sequence> <xs:attribute name="id" type="xs:string" /> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> </xs:schema> myths@business:~$ xmllint --noout --schema validates
Notice:The verification result information is output to stderr. The tool will echo the original file to stdout by default. You can add the –noout parameter to turn off stdout echo.
Streaming:
xmllint is to pass file names by default. If we want to pass data by piped file streams, we can do this:
myths@business:~$ cat |xmllint --format - <?xml version="1.0"?> <?xml version="1.0"?> <books> <book > <name>book1</name> <price>100</price> <license/> </book> <book > <name>book2</name> <price>200</price> </book> <book > <name>book3</name> <price>300</price> </book> </books>
xpath
Brief description
The xpath tool is actually a packaged perl script, and it only has about 200 lines. Its function is relatively special, which provides xpath query function. It generally belongs to the libxml-xpath-perl software package, so it is similar tosudo apt install libxml-xpath-perl
The command can be installed. Systems like suse will also come with their own.
Function
The versions installed in different systems may be different, but the basic functions are similar:
myths@business:~$ xpath -e '//book/name/text()' Found 3 nodes in : -- NODE -- book1 -- NODE -- book2 -- NODE -- book3
By default, the query results will be output to stdout and the description information will be output to stderr. If you can redirect stderr to /dev/null for the sake of easy collection of results, or add the -q parameter:
myths@business:~$ xpath -e '//book/name/text()' 2>/dev/null book1 book2 book3 myths@business:~$ xpath -q -e '//book/name/text()' book1 book2 book3
It is important that xpath has a little difference compared to xmllint's xpath function. If xpath matches multiple results, then xpath will output in a branch, while xmllint will rub it into a line:
myths@business:~$ xmllint --xpath "//book/name/text()" book1book2book3
xml2
Brief description
I don't think many people know the xml2 tool, but in fact, it can work miraculously with other commands in some scenarios. The developer's blog seems to have been lost, but it is estimated that it should be written in C and libxml2 libraries. It is usually in the xml2 software package, so commands like sudo apt install xml2 can be installed.
Function
This tool contains six commands: xml2, 2xml, html2, 2html, csv2, 2csv, and its function is also very unix, which is to convert the xml, html, and csv formats to a format he calls "flat format". For example:
myths@business:~$ cat |xml2 /books/book/@id=1 /books/book/name=book1 /books/book/price=100 /books/book /books/book/@id=2 /books/book/name=book2 /books/book/price=200 /books/book /books/book/@id=3 /books/book/name=book3 /books/book/price=300 myths@business:~$ cat |xml2|2xml <books><book ><name>book1</name><price>100</price></book><book ><name>book2</name><price>200</price></book><book ><name>book3</name><price>300</price></book></books>
This custom format is very simple and clever. Some represent new nodes (/books/books), some represent assign values to nodes (/books/book/name=book1), and some represent assign values to node attributes (/books/book/@id=1). The writing style is very similar to xpath but not exactly the same. And putting two corresponding commands together can achieve idempotence.
So what's the use of this conversion command? In fact, we often encounter some demands for creating xml files, but it is very troublesome to generate dynamically in the xml format. At this time, it is very convenient to use flat format to make a transit:
#!/usr/bin/env bash tempFile=$(mktemp ) function addBook(){ id=$1 name=$2 price=$3 echo "/books/book">>$tempFile echo "/books/book/@id=$id">>$tempFile echo "/books/book/name=$name">>$tempFile echo "/books/book/price=$price">>$tempFile } function main(){ addBook 1 book1 100 addBook 2 book2 200 addBook 3 book3 300 cat $tempFile|2xml|xmllint --format --output new_sample.xml - rm $tempFile } main "$@"
The above code generates the same new_sample.xml as new_sample.xml.
Summarize
The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.