Common ways to update XML documents in Java programming

This article briefly discusses four common methods for updating XML documents in Java programming, and analyzes the advantages and disadvantages of these four methods. Secondly, this article also discusses how to control the format of XML documents output by Java programs.

JAXP is the English abbreviation of Java API for XML Processing. Its Chinese meaning is: a programming interface written in the Java language for XML document processing. JAXP supports DOM, SAX, XSLT and other standards. In order to enhance the flexibility of JAXP usage, the developer specially designed a Pluggability Layer for JAXP. With the support of Pluggability Layer, JAXP can work together with various XML parsers (XML Parsers, such as Apache Xerces) that specifically implement the DOM API and SAX API, and can work together with XSLT processors (XSLT Processors, such as Apache Xalan) that specifically implement the XSLT standard. The advantage of applying Pluggability Layer is that we only need to be familiar with the definition of each programming interface of JAXP, without having a deep understanding of the specific XML parser and XSLT processor used. For example, in a Java program, the XML parser Apache Crimson is called through JAXP to process XML documents. If we want to use another XML parser (such as Apache Xerces) to improve the performance of the program, the original program code may not need any changes and can be used directly (all you need to do is add the jar file containing the Apache Xerces code to the environment variable CLASSPATH, and delete the jar file containing the Apache Crimson code in the environment variable CLASSPATH).

Currently, JAXP has been widely used, and can be said to be a standard API for processing XML documents in Java language. Some beginners often ask this question when learning to use JAXP: the program I wrote has updated the DOM Tree, but after the program exits, the original XML document has not changed, and it is still the same. How to achieve synchronous updates of the original XML document and the DOM Tree? At first glance, it seems that there is no corresponding interface/method/class in JAXP, which is a problem that many beginners are confused about. The main purpose of this article is to solve this problem, and briefly introduce several commonly used methods to synchronize the update of original XML documents and DOM Trees. To narrow the scope of the discussion, the XML parser involved in this article only includes Apache Crimson and Apache Xerces, while the XSLT processor only uses Apache Xalan.

Method 1: Read and write XML documents directly

This may be the stupidest and most primitive method. After the program obtains the DOM Tree, use various methods of the Node interface of the DOM model to update the DOM Tree. The next step should be to update the original XML document. We can use recursive methods or apply the TreeWalker class to traverse the entire DOM Tree. At the same time, each node/element of the DOM Tree is written into the pre-opened original XML document in sequence. After the DOM Tree is traversed completely, the DOM Tree and the original XML document will be updated synchronously. In fact, this method is rarely used, but if you want to program and implement your own XML parser, this method is still possible.

Method 2: Use the XmlDocument class

Using the XmlDocument class? It is clear that there is no such class in JAXP! Did the author get it wrong? Nothing is wrong! It is to use the XmlDocument class, to be precise, the write() method of the XmlDocument class.

As mentioned above, JAXP can be used in conjunction with various XML parsers. The XML parser we chose this time is Apache Crimson. XmlDocument() is a class of Apache Crimson and is not included in standard JAXP. No wonder the trace of the XmlDocument class cannot be found in the JAXP documentation. Now the question is, how to use the XmlDocument class to achieve the function of updating XML documents? The following three write() methods are provided in the XmlDocument class (according to the latest version of Crimson---------Apache Crimson 1.1.3):

public void write (OutputStream out) throws IOException
public void write (Writer out) throws IOException
public void write (Writer out, String encoding) throws IOException

The main function of the above three write() methods is to output the contents in the DOM Tree to a specific output medium, such as file output streams, application consoles, etc. So how do you use the above three write() methods? Please see the following Java program code snippet:

String name="fancy";
DocumentBuilder parser;
DocumentBuilderFactory factory = ();
try
{
　parser = ();
　Document doc = ("");
　Element newlink=(name);
　().appendChild(newlink);
((XmlDocument)doc).write(new FileOutputStream(new File("")));
}
catch (Exception e)
{
　//to log it
}

In the above code, first create a Document object doc, get the complete DOM Tree, then apply the appendChild() method of the Node interface, append a new node (fancy) at the end of the DOM Tree, and finally call the write(OutputStream out) method of the XmlDocument class to output the contents in the DOM Tree to it (in fact, it can also be output to update the original XML document, and it is output to the file for the sake of comparison). It should be noted that you cannot directly call the write() method on the Document object doc. Because the JAXP Document interface does not define any write() method, the doc must be cast from the Document object to an XmlDocument object before the write() method can be called. In the above code, the write(OutputStream out) method is used. This method uses the default UTF-8 encoding to output the contents in the DOM Tree to a specific output medium. If the DOM Tree contains Chinese characters, the output result may be garbled, that is, there is the so-called "Chinese character problem". The solution is to use write (Writer out, String encoding) method, explicitly specify the encoding when output, for example, set the second parameter to "GB2312", there is no "Chinese character problem" at this time, and the output result can display Chinese characters normally.

For complete examples, please refer to the following documents: (see attachment), (see attachment). The running environment of this example is: Windows XP Professional, JDK 1.3.1. In order to compile and run this program normally, you need to download Apache Crimson at the URL /dist/crimson and add the obtained file to the environment variable CLASSPATH.

Notice:

The predecessor of Apache Crimson was Sun Project X Parser. Later, somehow, it evolved from X Parser to Apache Crimson. To date, many of the codes of Apache Crimson have been directly ported from X Parser. For example, the XmlDocument class used above is transformed into a class in X Parser. In fact, most of their codes are the same, and it may be different in package statements, import statements, and a lience at the beginning of the file. Early JAXP was bundled with X Parser, so some old programs used packages. If you recompile them now, it may not be possible, which is definitely because of this reason. Later, JAXP and Apache Crimson were bundled together, such as JAXP 1.1. If you use JAXP 1.1, you do not need to download Apache Crimson, and you can also compile and run the above example() normally. The latest JAXP 1.2 EA (Early Access) has been changed, and Apache Xalan and Apache Xerces with better performance are XSLT processors and XML parsers respectively. They cannot directly support Apache Crimson. Therefore, if your development environment uses JAXP 1.2 EA or Java XML Pack (including JAXP 1.2 EA), then the above example () will not be directly compiled and run. You need to download and install Apache Crimson.

Method 3: Use TransformerFactory and Transformer class

The standard way to update the original XML document provided in JAXP is to call the XSLT engine, that is, use the TransformerFactory and Transformer classes. Please see the following Java code snippet:

//First create a DOMSource object, the constructor parameter can be a Document object
//doc stands for the changed DOM Tree.
DOMSource doms = new DOMSource (doc);

//Create a File object that represents the output medium of the data contained in the DOM Tree. This is an XML file.
File f = new File ("");

//Create a StreamResult object, and the parameters of the constructor can be taken as File object.
StreamResult sr = new StreamResult (f);

//The following is called the XSLT engine in JAXP to implement the function of outputting data in DOM Tree to XML file.
//The input of the XSLT engine is a DOMSource object and the output is a StreamResut object.
try
{
//First create a TransformerFactory object, and then create a Transformer object. Transformer
//The class is equivalent to an XSLT engine. Usually we use it to process XSL files, but here we make
//Use it to output XML documents.
TransformerFactory tf=();
Transformer t= ();

//A key step, call the transform() method of the Transformer object (XSLT engine), the first of this method
//The parameters are DOMSource objects, and the second parameter is the StreamResult object.
(doms,sr);
}
catch (TransformerConfigurationException tce)
{
("Transformer Configuration Exception -----");
();
}
catch (TransformerException te)
{
("Transformer Exception ---------");
();
}

In actual applications, we can use the traditional DOM API to obtain DOM Tree from XML documents, and then perform various operations on DOM Tree according to actual needs to obtain the final Document object. Next, we can create a DOMSource object from this Document object. The rest is to copy the above code. After the program is completed, it is the result you need (of course, you can change the parameters of the StreamResult class constructor at will and specify different output media, instead of the same XML document).

The biggest advantage of this method is that it can control the format of outputting the content in the DOM Tree to the output medium as you wish. However, the TransformerFactory class and the Transformer class cannot implement this function, and it also requires the help of the OutputKeys class. For complete examples, please refer to the following documents: (see attachment), (see attachment). The running environment of this example is: Windows XP Professional, JDK 1.3.1. In order to compile and run this program normally, you need to download and install JAXP 1.1 or Java XML Pack (Java XML Pack already contains JAXP).

OutputKeys class

When used in conjunction with the class, you can control the format of the output XML document by JAXP's XSLT engine (Transformer class). Please see the following code snippet:

//First create a TransformerFactory object, and then create a Transformer object.
TransformerFactory tf=();
Transformer t= ();

//Get the output attribute of the Transformser object, that is, the default output attribute of the XSLT engine, this is a
//Object.
Properties properties = ();

//Set new output attributes: The output character encoding is GB2312, which can support Chinese characters and output by the XSLT engine
If the XML document contains Chinese characters, it can be displayed normally and there will be no so-called "Chinese character problems".
//Please pay attention to the string constants of the OutputKeys class.
(,"GB2312");

/Update the output properties of the XSLT engine.
(properties);

//Call the XSLT engine and output the contents in the DOM Tree to the output medium according to the settings in the output properties.
(DOMSource_Object,StreamResult_Object);

From the above program code, it is not difficult to see that by setting the output properties of the XSLT engine (Transformer class), you can control the output format of the content in the DOM Tree, which is very helpful for us to customize the output content. So what output properties do JAXP's XSLT engine (Transformer class) have? The class defines many string constants, which are all output properties that can be set freely. The commonly used output properties are as follows:

public static final METHOD

Can be set to values such as "xml", "html", "text".

public static final VERSION

The version number of the following specifications, if METHOD is set to "xml", its value should be set to "1.0", if METHOD is set to "html", its value should be set to "4.0", and if METHOD is set to "text", this output attribute will be ignored.

public static final ENCODING

The encoding methods used when setting the output, such as "GB2312", "UTF-8", etc., if it is set to "GB2312", it can solve the so-called "Chinese character problem".

public static final OMIT_XML_DECLARATION

Set whether to ignore XML declarations when output to XML documents, that is, similar to:

＜?xml version="1.0" standalone="yes" encoding="utf-8" ?＞

Such code. Its optional values are "yes", "no".

public static final INDENT

IDENT Sets whether the XSLT engine automatically adds extra spaces when outputting XML documents. Its optional values are "yes" and "no".

public static final MEDIA_TYPE

MEDIA_TYPE sets the MIME type of the output document.

What if the output properties of the XSLT engine are set? Let's summarize it below:

The first is to get the set of default output properties of the XSLT engine (Transformer class). This requires the use of the getOutputProperties() method of the Transformer class, and the return value is an object.

Properties properties = ();

Then set new output properties, such as:

(,"GB2312");
(,"html");
(,"4.0");
………………………………………………………

Finally, update the set of default output properties of the XSLT engine (Transformer class), which requires the use of the setOutputProperties() method of the Transformer class, and the parameter is an object.

We have written a new program, which applies the OutputKeys class to control the output properties of the XSLT engine. The architecture of this program is roughly the same as the previous program(), but the output results are slightly different. For complete code, please refer to the following documents: (see attachment), (see attachment). The running environment of this example is: Windows XP Professional, JDK 1.3.1. In order to compile and run this program normally, you need to download and install JAXP 1.1 or Java XML Pack (Java XML Pack contains JAXP).

Method 4: Use Xalan XML Serializer

Method 4 is actually a variant of Method 3. It requires the support of Apache Xalan and Apache Xerces to run. The example code is as follows:

//First create a DOMSource object, the constructor parameter can be a Document object
//doc stands for the changed DOM Tree.
DOMSource domSource = new DOMSource (doc);

//Create a DOMResult object and temporarily save the output results of the XSLT engine.
DOMResult domResult = new DOMResult();

//The following is called the XSLT engine in JAXP to implement the function of outputting data in DOM Tree to XML file.
//The input of the XSLT engine is a DOMSource object and the output is a DOMResut object.
try
{
//First create a TransformerFactory object, and then create a Transformer object. Transformer
//Class is equivalent to an XSLT engine. Usually we use it to process XSL files, but here we make
//Use it to output XML documents.
　TransformerFactory tf=();
　Transformer t= ();

//Set the properties of the XSLT engine (essential, otherwise "Chinese character problems" will occur).
　Properties properties = ();
　(,"GB2312");
　(properties);

//A key step is to call the transform() method of the Transformer object (XSLT engine), the first of this method
//The parameters are DOMSource objects, and the second parameter is DOMResult objects.
　(domSource,domResult);

//Create the default Xalan XML Serializer and use it to temporarily store it in the DOMResult object
//The content in (domResult) is output to the output medium in the form of an output stream.
　Serializer serializer =
(("xml"));

//Set the output properties of Xalan XML Serializer, this step is essential, otherwise it may also produce
//The so-called "Chinese character problem".
　Properties prop=();
　("encoding","GB2312");
　(prop);

//Create a File object that represents the output medium of the data contained in the DOM Tree. This is an XML file.
　File f = new File ("");

//Create the file output stream object fos, please pay attention to the parameters of the constructor.
　FileOutputStream fos=new FileOutputStream(f);

//Set the output stream of Xalan XML Serializer.
　(fos);

//Serialized output result.
　().serialize(());
}
catch (Exception tce)
{
　();
}

This method is not very commonly used and seems a bit extravagant, so we won't discuss it. For complete examples, please refer to the following documents: (see attachment), (see attachment). The running environment of this example is: Windows XP Professional, JDK 1.3.1. In order to compile and run this program normally, you need to download and install Apache Xalan and Apache Xerces at the URL /dist/.

Or go to the URL /xml/ to download and install Java XML Pack. Because the latest Java XML Pack (Winter 01 version) includes Apache Xalan and Apache Xerces technologies.

in conclusion:

This article briefly discusses four ways to update XML documents in Java programming. The first method is to read and write XML files directly. This method is very cumbersome and is prone to errors and is rarely used. Unless you need to develop your own XML Parser, you will not use this method. The second method is to use Apache Crimson's XmlDocument class. This method is extremely simple and convenient to use. If you choose Apache Crimson as the XML parser, you might as well use this method. However, this method seems to be inefficient (derived from the inefficient Apache Crimson). In addition, higher versions of JAXP or Java XML Pack and JWSDP do not directly support Apache Crimson, that is, this method is not universal. The third method is to use JAXP's XSLT engine (Transformer class) to output XML documents. This method may be a standard method and is very flexible to use, especially to control the output format freely. We recommend this method. The fourth method is a variant of the third method, which uses Xalan XML Serializer and introduces serialization operations, which is superior to the modification/output of a large number of documents. Unfortunately, it is difficult to repeatedly set the properties of the XSLT engine and the output properties of the XML Serializer. Moreover, relying on Apache Xalan and Apache Xerces technologies, the versatility is slightly insufficient.

In addition to the four methods discussed above, there are actually many ways to update XML documents using other APIs (such as JDOM, Castor, XML4J, Oracle XML Parser V2). Due to space, I will not discuss them one by one here.