Structure and syntax of the introduction to XML solution

Now let’s use “Notepad” to create our XML file. Let's look at an XML file first:

Example 1

〈?xml version="1.0" encoding="gb2312" ?〉
〈Reference〉
<books>
〈Name〉Introductory explanation of XML〈/Name〉
〈Author〉Zhang San〈/Author〉
〈Price Currency Unit="Renminbi"〉20.00〈/Price〉
</books>
<books>
〈name〉XML syntax〈/name〉
〈!--This book is about to be published--〉
〈Author〉Li Si〈/Author〉
〈Price Currency Unit="Renminbi"〉18.00〈/Price〉
</books>
〈/References〉

This is a typical XML file, which is saved as a file with .xml as the suffix. We can divide this file into two major parts: the file prologue (Prolog) and the file body. The first line in this file is the file preface. This line is something that an XML file must declare, and it must also be located in the first line of the XML file. It mainly tells the XML parser how it works. Among them, version is the standard version number used in this XML file, and must have it; encoding indicates the type of character used in this XML file, and can be omitted. When you omit this declaration, the subsequent character code must be Unicode character code (it is recommended not to omit it). Because we are using the GB2312 character code in this example, the encoding declaration cannot be omitted. There are also some declaration statements in the preamble of the file, which we will introduce later.

The rest of the file belongs to the main body of the file, and the content information of the XML file is stored here. We can see that the main body of the file is composed of the beginning of the "reference material" and the end of the "/reference material" control mark, which is called the "root element" of the XML file; the "book" is the "sub-element" directly under the root element; under the "book", there are also sub-elements such as "name", "author", and "price". The currency unit is a "attribute" in the <price> element, and "RMB" is the "attribute value".

〈!--This book is about to be published--〉 The sentence is like HTML, which is annotated. In an XML file, the comment part is placed between the "〈!--" and "--〉" tags.

As you can see, XML files are quite simple. Like HTML, XML files are composed of a series of tags. However, tags in XML files are our custom tags and have clear meanings. We can explain the meaning of the content in the tag.

After having a preliminary impression of XML files, let’s talk about the syntax of XML files in detail. Before talking about syntax, we must understand an important concept, which is XML parser (XML Parse).

Parser

The main function of the parser is to check whether there are structural errors in the XML file, strip away the marks in the XML file, read out the correct content, and hand it over to the next application to handle. XML is a markup language used to structure file information. There is a detailed rule in the XML specification on how to mark the structure of files. The parser is the software written based on these rules (many written in Java). Like HTML, in the browser, there must be an HTML parser so that the browser can "read" various web pages composed of HTML tags and display them in front of us. If there is a tag that the browser's HTML parser cannot understand, it will return an error message.

Since the current HTML tags are actually quite confusing and there are a lot of irregular tags (some web pages can be displayed normally with IE, but not with Netscape Navigator), from the very beginning, XML designers strictly stipulated the syntax and structure of XML. The XML files we write must follow these regulations, otherwise the XML parser will display error messages to you without mercy.

There are two types of XML files, one is the Well-Formed XML file, and the other is the Validating XML file.

If an XML file meets some relevant rules in the XML specification and does not use DTD (File Format Definition - Described in detail later), the file can be called Well-Formed. If an XML file is Well-Formed and DTD is correctly used and the syntax in DTD is correct, then this file is Validating. There are two types of XML files, and there are two XML parsers, one is the Well-Formed parser and the other is the Validating parser. IE 5 contains the Validating parser, which can also be used to parse Well-Formed XML files.

Check if it meets the Well-Formed condition. We can open the first XML file we just edited with a browser with IE 5 or above.

You may ask why the display in the browser is the same as my source file? That's right, because for XML files, we relies on the contents of the contents, and its display form is done by handing over to CSS or XSL. Here, we do not define its CSS or XSL file for this XML file, so it is displayed in its original form. In fact, for electronic data exchange, only an XML file is needed. If we want to display it in some form, we have to edit the CSS or XSL file (this question will be discussed later).

-Formed's XML file

We know that XML must be Well-Formed in order to be correctly parsed by the parser and displayed in the browser. So what is a Well-Formed XML file? There are mainly the following guidelines. When we create XML files, we must meet them.

First, the first line of the XML file must be to declare that the file is an XML file and the XML specification version it uses. There cannot be any other elements or comments in front of the file.

Second, there is and can only be one root element in the XML file. In our first example, "References"... "/References" is the root element of this XML file.

Third, the tags in the XML file must be closed correctly, that is, in the XML file, the control tags must have a corresponding end tag. For example: the <name> tag must have the corresponding </name> end tag. Unlike HTML, the end tags of some tags are optional. If you encounter a mark that forms a unit in an XML file, it is similar to those of <img src=.....> in HTML without end tags, XML calls it "empty element", and must be written in this way: <empty element name/>. If the element contains attributes, the writing rule is: <empty element name attribute name = "attribute value"/>.

Fourth, marks must not cross. In previous HTML files, you can write this:

〈B〉〈H〉XXXXXXX〈/B〉〈/H〉,〈B〉 and 〈H〉

There are overlapping areas between markers, and in XML, such tag interleaving is strictly prohibited, and markers must appear in regular order.

Fifth, attribute values must be enclosed with the " " sign. Such as "1.0", "gb2312", and "RMB" in the first example. They are all enclosed with the " " and cannot be missed.

Sixth, control marks, instructions, attribute names and other English should be case sensitive. Unlike HTML, in HTML, tags like <B> and <b> have the same meaning, while in XML, tags like <name>, <NAME> or <Name> have different values.

Seventh, we know that in HTML files, if we want the browser to display what we input intact, we can put these things in the middle of the <pre> </pre> or <xmp> </xmp> tag. This is essential for us to create HTML-teached web pages, because the source code of HTML is displayed in the web page. In XML, to implement such a function, CDATA tags must be used. The information in the CDATA tag is passed to the application intact by the parser and does not resolve any control tags in the segment of information. The CDATA area is composed of: "<![CDATA[" as the start mark and ">>" as the end mark. For example: In the source code in Example 2, except for the "<![CDATA[" and ">>" symbols, the rest of the content parser will hand over to downstream applications intact. Even the beginning and ending blanks and line break characters in the CDATA area will be transferred (note that CDATA is a capitalized character).

Example 2

〈![CDATA[Flying xml>〉〉〉〉〉, :-)
oooo〈〈〈〈〈〈〈
>〉

Eighth, XML handles whitespace characters differently from HTML. The HTML standard stipulates that no matter how many blanks there are, they are treated as a blank; while in XML, it is stipulated that the parser must faithfully hand over all blanks other than tags to downstream applications for processing. In this way, we sometimes have to abandon the indentation habit when writing HTML files, because the indentation spaces are also processed by the parser. like:

〈Author〉Zhang San〈/Author〉
and
<author>
Zhang San
</author>

The above content is different for the parser (the latter includes two newline marks and the text indentation symbol before "Zhang San" in addition to the character Zhang San in addition to the character Zhang San). So after the parser removes the mark, passes the information to the application, there will be different processing results.

If we want to tell XML programs explicitly that blanks in tags have clear meanings and do not remove them casually (such as in some poems, spaces have their specific meanings), we can add an XML built-in attribute - xml:space to the tag. For example (note the case of the property name and value):

〈Poetry xml:space="preserver"〉
Motherland! motherland!
My motherland!
〈/Poetry〉

In addition, in XML files, if you want to use special characters in Table 1, you must use the corresponding symbol instead.

Table 1

Special characters Alternative symbols
&& &
< ⁢
> >
" "
' '

Here is a summary: XML files that comply with the above regulations are Well-Formed XML files. This is the most basic requirement for writing XML files. You can see that the syntax regulations of XML files are much stricter than HTML. Due to such strict regulations, it is much easier for software engineers to write XML parsers. Unlike parsers in HTML language, they must work hard to adapt to different web page writing methods and improve their browser adaptability. In fact, this is also a good thing for us beginners. Do whatever you want, and don't have to wonder how to write various HTML as before.

We see that in XML files, most of the custom tags are used. But think about it. If two companies A and B in the same industry want to exchange data with each other with XML files, Company A uses the <price> tag to represent the price information of their products, while Company B may use the <sell price> to represent the price information. If an XML application reads the information in their respective XML files, if it only knows that the price information in the <price> tag is represented, then the price information of Company B will not be read, and an error will be generated. Obviously, for entities that want to use XML files to exchange information, there must be a convention between them - that is, what tags can be used to write an XML file, which child elements can be included in the parent element, the order in which each element appears, how to define the attributes in the element, etc. This way they can be unimpeded when exchanging data in XML. This convention is called DTD (Document Type Definition, document format definition). You can think of DTD as a template for writing XML files. For XML data exchange between industries, it will be much more convenient to have a fixed DTD. For example, if the XML web pages of major electronic malls on the Internet follow the same DTD, then we can easily write an application based on this DTD and automatically catch the things we are interested in online. In fact, there are already several well-defined DTDs, such as the MathML, SMIL, etc. mentioned above.

If an XML file is Well-Formed and it is correctly created based on a DTD, then the XML file is called: Validating XML file. The corresponding parser is called: Validating Parser.