SoFunction
Updated on 2025-04-08

Frequently Asked Questions about Getting Started with XML (IV)

How to deal with whitespace characters in XML object model?

Sometimes, the XML object model displays a TEXT node containing whitespace characters. After the whitespace characters are truncated, it will probably bring some confusion. For example, the following XML example:


 ]>
Smith
John
 

Generate the following tree:


 Processing Instruction: xml
DocType: person
ELEMENT: person
TEXT:
ELEMENT: lastname
TEXT:
ELEMENT: firstname
TEXT: 


On both sides of the first and last name are TEXT nodes that contain only whitespace characters, because the content model of the "person" element is MIXED; it contains the #PCDATA keyword. The MIXED content model specifies that text can exist between elements. Therefore, the following is also correct:


 My last name is Smith and my first name is
John


The result is similar to the following tree:


 ELEMENT: person
TEXT: My last name is
ELEMENT: lastname
TEXT: and my first name is
ELEMENT: firstname
TEXT: 


If there are no whitespace characters after and before the word "is", and whitespace characters after and before the word "and", then the sentence cannot be understood. Therefore, for the MIXED content model, text combinations, whitespace characters, and elements are all related. This is not the case for non-MIXED content models.

To make a TEXT node with only whitespace characters disappear, remove the #PCDATA keyword from the "person" element declaration:

The result is the following clear tree:


 Processing Instruction: xml
DocType: person
ELEMENT: person
ELEMENT: lastname
ELEMENT: firstname

What does an XML declaration do?

XML declarations must be listed at the top of the XML document:

It specifies the following items:

This document is an XML document. When a MIME type is lost or has not been specified, the MIME detector can use it to detect whether the file is of type text/xml.
The document complies with the XML 1.0 specification. This is important in the future when there are other versions of the XML.
Document character encoding. The encoding attribute is optional and defaults to UTF-8.
Note: The XML declaration must be on the first line of the XML document, so the following XML file:

The following analysis error occurs:

Invalid xml declaration.
Route 0000002:
Location 0000007: ------^
Note: XML declarations are optional. If you need to specify comments or processing instructions at the top, please do not put XML declarations. However, the default encoding will be UTF-8.

How do I print my XML document in a readable format?

When constructing a document from scratch with DOM to produce an XML file, anything is on one line, with no spaces between each other. This is the default behavior.

Constructs the default XSL stylesheet in Internet Explorer 5 to display and print XML documents in a readable format. For example, if IE5 is already installed, try viewing the file. The following tree should be displayed in the browser:

-
-
XYZ
12.56

No whitespace characters are inserted in XML.

Printing readable XML is very interesting, especially when there are DTDs that define different types of content models. For example, spaces cannot be inserted under the Mixed Content Model (#PCDATA) because it may change the meaning of the content. For example, consider the following XML:

Elephant
This is best not to output:

E
lephant
Because the word boundaries are no longer correct.

All of this makes automated printing a problem. If you do not need to print readable XML, you can use the DOM to insert whitespace characters in the appropriate location as text nodes.

How to use namespace in DTD? To use a namespace in a DTD, declare it in the ATTLIST declaration of the element that uses it, as follows:

The namespace type must be #FIXED. The namespace of the attribute is the same:

Namespace and XML schema
DTD and XML schemas cannot be mixed. For example, the following


 xmlns:x CDATA #FIXED "x-schema:"


Will not result in the use of the schema definition defined in . The use of DTD and XML schemas is mutually exclusive.

How to use XMLDSO in Visual Basic?

Use the following XML as an example:


 Mark Hanson
206 765 4583

Jane Smith
425 808 1111 


You can bind to the ADO record set as follows:

Create a new VB 6.0 project.

Add a reference to Microsoft ActiveX Data Objects 2.1 or later, Microsoft Data Adapter Library, and Microsoft XML version 2.0.

Load the XML data into the XML DSO control using the following code:


 Dim dso As New XMLDSOControl
Dim doc As IXMLDOMDocument
Set doc =
("d:\")

Use the following code to map DSO to a new record set object using DataAdapter:


 Dim da As New DataAdapter
Set = dso
Dim rs As New
Set = da


Access data:


 MsgBox ("name").Value

The result shows the string "Mark Hanson"
How to use XML DOM in Java?

The IE5 version must be installed. In Visual J++ 6.0, select Add COM wrapper from the project menu, and then select Microsoft XML 1.0 from the COM object list. This operation will construct the required Java wrapper into a new software package called "msxml". These pre-constructed Java wrappers are also available for download. The class can be used as follows:


 import .*;
import msxml.*;
public class Class1
{
public static void main (String[] args)
{
DOMDocument doc = new DOMDocument();
(new Variant("file://d:/samples/"));
("Loaded " + ().getNodeName());
}
}

 

The code example will load the 3.8MB test file "" from the sun religion example. The Variant class is used to wrap Win32 VARIANT primitive types.

Because a new wrapper is actually obtained every time a node is retrieved, pointer comparisons cannot be used on the node. Therefore, do not use the following code,


 IXMLDOMNode root1 = ();
IXMLDOMNode root2 = ();
if (root1 == root2)...

 

And use the following code:


 if ((root1, root2)) ....

 

The total size of the .class wrapper is approximately 160KB. However, to be fully compliant with the W3C specification, only the IXMLDOM* wrapper should be used. The following classes are old IE 4.0 XML interfaces that can be deleted from the msxml folder:


 IXMLAttribute*,
IXMLDocument*, XMLDocument*
IXMLElement*,
IXMLError*,
IXMLElementCollection*,
tagXMLEMEM_TYPE*
_xml_error* 

 

This reduces the size to 147KB. You can also delete the following items:


 DOMFreeThreadedDocument
Access XML documents from multiple threads in a Java application.
XMLHttpRequest
Use XML DAV HTTP extension to communicate with the server.
IXTLRuntime
Defines an XSL stylesheet script object.
XMLDSOControl
Bind to XML data in HTML page.
XMLDOMDocumentEvents
Returns a callback during analysis.

 

This can reduce the size to 116KB. To make it smaller, consider the fact that the DOM itself has two layers: the core layer includes:


 DOMDocument, IXMLDOMDocument
IXMLDOMNode*
IXMLDOMNodeList*
IXMLDOMNamedNodeMap*
IXMLDOMDocumentFragment*
IXMLDOMImplementation
IXMLDOMParseError 


and DTD information that users may need to retain:


 IXMLDOMDocumentType
IXMLDOMEntity
IXMLDOMNotation 


All node types in the XML document are IXMLDOMNode, which provides all functionality, but there are higher-level wrappers for each node type. Therefore, if you modify the DOMDocument wrapper and change these specific types to use IXMLDOMNode, all the following interfaces can be removed:


 IXMLDOMAttribute
IXMLDOMCDATASection
IXMLDOMCharacterData
IXMLDOMComment
IXMLDOMElement
IXMLDOMProcessingInstruction
IXMLDOMEntityReference
IXMLDOMText
 

Deleting these will reduce the size to 61KB. However, both getAttribute and setAttribute methods are useful for IXMLDOMElement. Otherwise, you need to use:


 ().setNamedItem(...)