SoFunction
Updated on 2025-03-10

C/C++ uses libxml2 to efficiently output large XML files.

Preface

Libxml2 is a parser in the xml C language version. It was originally a tool developed for the Gnome project and is a free and open source software based on MIT License. In addition to supporting the C language version, it also supports binding of languages ​​such as C++, PHP, Pascal, Ruby, and Tcl, and can run on platforms such as Windows, Linux, Solaris, and MacOsX. The functions are still quite powerful, and I believe there is no problem meeting the needs of general users.

Common data types for libxml2

xmlChar is the character type in libxml2. All characters in the library and strings are based on this data type.

xmlChar* is a pointer type. Many functions will return a variable of xmlChar* type that is dynamically allocated memory. Therefore, when using such functions, remember to free up memory, otherwise it will cause memory leakage. For example, this is the usage:

xmlChar *name = xmlNodeGetContent(CurNode);
strcpy(, name);
xmlFree(name);
  • xmlDoc, xmlDocPtr // Document object structure and pointer
  • xmlNode, xmlNodePtr //Node object structure and node pointer
  • xmlAttr, xmlAttrPtr //Structure of node attributes and their pointers
  • xmlNs, xmlNsPtr //Structure and pointer of node namespace
  • BAD_CAST //A macro definition, in fact, it is the xmlChar* type

Scene

1.libxml2 is basically considered a standard C/C++ reading and writing library for xml. It is supported by default in linux and macOS. Unfortunately, it has its own proprietary msxml on Windows, so it does not support libxml2. What is disgusting is that msxml is not standard yet, and it must be downloaded and installed separately. Therefore, as the preferred XML library on Windows, it is libxml2 that can be cross-platform.

The sax read library expat is also a relatively excellent choice, but unfortunately it does not support writing.

3. The general way to write a library is to generate a whole DOM structure, and then output the DOM structure to XML text. You can call the built-in write function or standard io function. The disadvantage of this is that if the DOM structure is generated too large, it will cause the memory to surge when the DOM structure is generated, and then output to memory. At this time, the memory will surge again and finally be output from memory to file.

illustrate

Structural storage is a waste of memory. If the amount of data is large, the parent-child relationship of elements, text values, attribute values, etc. are a waste of memory. If we can output according to each element, it is best to release the element memory after output, so that memory resources can be used to the maximum extent.

2. Local output elements can maximize the use of system resources, such as functions that require permission restrictions for IO output, or output to the interface, etc.

example

The following example is the libxml2 compiled with mingw on Windows, and the _wfopen is used to open the unicode-encoded file path.

#include ""
#include <libxml/>
#include <libxml/>
#include <libxml/>
#include <iostream>
#include <memory>

void TestStandardIOForXml()
{
 xmlDocPtr doc = NULL; /* document pointer */
 xmlNodePtr one_node = NULL, node = NULL, node1 = NULL;/* node pointers */
 char buff[256];
 int i, j;

 doc = xmlNewDoc(BAD_CAST "1.0");
 std::shared_ptr<void> sp_doc(doc,[](void* doc1){
 xmlDocPtr doc = (xmlDocPtr)doc1;
 xmlFreeDoc(doc);
 });

 FILE* file = _wfopen(L"",L"wb");
 if(!file)
 return;

 std::shared_ptr<FILE> sp_file(file,[](FILE* file){
 fclose(file);
 });

 // Write XML statement xmlChar* doc_buf = NULL;
 int size = 0;
 xmlDocDumpMemoryEnc(doc,&doc_buf,&size,"UTF-8");
 std::shared_ptr<xmlChar> sp_xc(doc_buf,[](xmlChar* doc_buf){
 xmlFree(doc_buf);
 });
 fwrite(doc_buf,strlen((const char*)doc_buf),1,file);
 xmlBufferPtr buf = xmlBufferCreate();
 std::shared_ptr<void> sp_buf(buf,[](void* buf1){
 xmlBufferPtr buf = (xmlBufferPtr)buf1;
 xmlBufferFree(buf);
 });

 const char* kRootBegin = "<ROOT>";
 fwrite(kRootBegin,strlen(kRootBegin),1,file);
 for(int i = 0; i< 10; ++i){
 one_node = xmlNewNode(NULL, BAD_CAST "one");
 xmlNewChild(one_node, NULL, BAD_CAST "node1",
  BAD_CAST "content of node 1");
 xmlNewChild(one_node, NULL, BAD_CAST "node2", NULL);
 node = xmlNewChild(one_node, NULL, BAD_CAST "node3",BAD_CAST "this node has attributes");
 xmlNewProp(node, BAD_CAST "attribute", BAD_CAST "yes");
 xmlNewProp(node, BAD_CAST "foo", BAD_CAST "bar");

 node = xmlNewNode(NULL, BAD_CAST "node4");
 node1 = xmlNewText(BAD_CAST "other way to create content (which is also a node)");
 xmlAddChild(node, node1);
 xmlAddChild(one_node, node);

 xmlNodeDump(buf,doc,one_node,1,1);
 fwrite(buf->content,buf->use,1,file);

 xmlUnlinkNode(one_node);
 xmlFreeNode(one_node);
 xmlBufferEmpty(buf);
 }

 const char* kRootEnd = "</ROOT>";
 fwrite(kRootEnd,strlen(kRootEnd),1,file);

}

Output file:

<?xml version="1.0" encoding="UTF-8"?>
<ROOT><one>
 <node1>contentÖÐÎÄ of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one><one>
 <node1>content of node 1</node1>
 <node2/>
 <node3 attribute="yes" foo="bar">this node has attributes</node3>
 <node4>other way to create content (which is also a node)</node4>
 </one></ROOT>

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.