XML (Introduction)
XML (Extensible Markup Language) has been the focus of numerous activities and fanatical thinking since its debut in the late 1990s. XML is just based on plain text, but provides a way to share data between almost any two applications.
Although XML is conceptually simple, processing XML is often tedious (with a lot of repetitive code written) and complex (many details that are easily overlooked lead to errors).
When to use XML?
When do you use XML in a web application?
- You need to process data that has been saved in XML.
- When you want to save data in XML and prepare for possible future integrations. (XML makes the most sense in application integration scenarios)
- When you want to use XML-dependent techniques. (Web services use various XML-based standards)
- annotation:
- An important concept that must be understood is that two things must be decided when storing data:
- Determine the way of data structure (logical format)
- Determine how data is stored (physical storage)
XML is a choice of format rather than a choice of storage. That is, even if you decide to save data in XML, you still have to decide whether to save it in the database field, insert it into a file, or just save it in memory as a string or other object.
Introduction to XML
The XML specification is a set of guides defined by W3C (World Wide Web Consortium) for describing structured data in plain text, a markup language based on inter-angle bracket tags.
XML does not have a fixed set of tags. Instead, XML is a meta-language that can be used to create other markup languages.
The following document shows a custom XML format that saves product categories:
<?xml version="1.0" encoding="utf-8" ?> <productCatalog> <catalogName>Acme Fall 2015 Catalog</catalogName> <expiryDate>2015-01-01</expiryDate> <products> <product > <productName>Magic Ring</productName> <productPrice>342.10</productPrice> <inStock>true</inStock> </product> <product > <productName>Flying Carpet</productName> <productPrice>982.99</productPrice> <inStock>true</inStock> </product> </products> </productCatalog>
Tags are free to use any name that best describes your data, and it is this flexibility that makes XML very successful. Of course, flexibility also has disadvantages. Different companies can completely use different tag names to describe similar data. Although all applications can parse XML data, the writers and readers of the data need to agree on the tags and structures so that readers can interpret the data and extract meaningful information.
Advantages of XML
Today, XML is more useful than any other day in the past. The benefits of using XML for modern applications are as follows:
- Adaptability. XML is everywhere, and whenever you need to share data, XML becomes the preferred target.
- Scalability and flexibility. XML does not impose any rules on data semantics, applies to arbitrary data types and is cheap to implement.
- Related standards and tools. Another reason for XML success is the tools (parsers) and related standards (XML schema, XPath, XSLT) that create and process XML. In this way, developers in almost every language have ready-made components for reading XML, verify the validity of XML by some rules (called schema), convert XML into formats, etc.
Good format XML
XML is a very strict standard, and this stringency is used to preserve broad compatibility. (The infamous HTML language is the product of without such strict standards)
All XML parsers perform some basic quality checks. If an XML document does not meet all criteria, it will be rejected completely. Otherwise, it is considered to be in good format. A well-formed XML is not necessarily the correct XML, for example, contains wrong data, but the XML parser can parse it.
XML documents must meet the following conditions before they are considered to be well-formed:
- Each start tag must have a corresponding end tag
- The empty element must end with "/>"
- Elements can be nested but cannot be interleaved
- XML is strictly case sensitive, so <FirstName> and </firstName> cannot be paired
- An element cannot have two or more features of the same name, but can nest multiple elements of the same name
- A document can have only one root element
- All characteristics must have quotes before and after the value
- Annotations cannot be placed in tags (they are included in <!-- and --> tags)
XML namespace
As the XML standard grows, dozens of XML markup languages (usually called XML syntax) have been created. Many of them belong to specific industries, processes and information types. What happens if you need to combine two XML syntaxes with the same name elements at the same time? Another more typical question is how to distinguish them?
The solution lies in the XML namespace standard. The core idea of this standard is that all XML markup languages have namespaces that can uniquely distinguish related elements. Simply put, namespaces can eliminate the ambiguity of elements of the same name when integrating.
All XML namespaces use URIs (Universal Resource Identifiers), which generally look similar to the URL of a web page. For example,/mystandardis a typical namespace, but this is not necessary (and should not be assumed), the namespace can be an arbitrary sequence of text, the standard is to ensure its uniqueness.
To specify that an element belongs to a specific namespace, just add the xmlns (XML Name Space) feature to the start tag to indicate the namespace to be used. For example, the following element ishttp://mycompany/OrderMLPart of the namespace.
<order xmlns="http://mycompany/OrderML"></order> You will be tired of the tedious operation of adding this feature to all elements,fortunately,If you add a namespace like this,It will become the default namespace for all child elements: <product xmlns="http://mycompany/OrderML"> <productName>Flying Carpet</productName> <productPrice>982.99</productPrice> <inStock>true</inStock> </product> You can also customize namespace prefix,exist xmlns Insert a colon and a character you want to use as a prefix in the feature: <ord:order xmlns:ord="http://mycompany/OrderML" xmlns:cli="http://mycompany/ClientML"> <cli:client> <cli:firstName>...</cli:firstName> <cli:lastName>...</cli:lastName> </cli:client> <ord:orderItem>...</ord:orderItem> <ord:orderItem>...</ord:orderItem> </ord:order>
XML Schema
The flexibility of XML also brings some problems. Developers around the world use your XML format. How can we ensure that everyone follows the rules?
The solution is to create a format document that defines the rules of your custom markup language, which is called schema. These rules do not include syntax details (that is what the XML standard requires), and the schema document needs to define logical rules that match your data type, which includes the following:
- Document vocabulary. It defines which elements or attribute names can appear in your XML document.
- Document structure. It defines where the tags are placed, can specify the order between tags, and can also specify the number of times a certain element can appear.
- Supported data types. It can be defined that the data is text, or must be numerical data, date information, etc. that can be parsed.
- Allowed data range. The values can be limited to a range, the text can be limited to a specific length, the regular expression pattern matching, or the limit can be only certain specific values.
The following XML schema defines the product category rules shown above:
<?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http:///2001/XMLSchema"> <xsd:element name="productCatalog"> <xsd:complexType> <xsd:sequence> <xsd:element name="CatalogName" type="xsd:string" /> <xsd:element name="expiryDate" type="xsd:date" /> <xsd:element name="products"> <xsd:complexType> <xsd:sequence> <xsd:element name="product" type="productType" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:complexType name="productType"> <xsd:sequence> <xsd:element name="productName" type="xsd:string" /> <xsd:element name="productPrice" type="xsd:decimal" /> <xsd:element name ="inStock" type="xsd:boolean" /> </xsd:sequence> <xsd:attribute name="id" type="xsd:integer" use="required" /> </xsd:complexType> </xsd:schema>
- All schema documents are XML documents starting with the root element <schema>
- All available elements have been defined in the XML schema space (http:///2001/XMLSchema)
- Your schema document must use the correct namespace name (the prefix is usually xsd or xs, you can also customize it)
- In the <schema> element, there are two types of definitions
- <element> Defines the structure that the target document must follow
- <complexType> Defines a data structure with a smaller document structure
- <element> tag is the core of the architecture, and it is also the starting point for all verifications
In this example, the <element> tag determines that the product category must start with a root element called <productCatalog>. The <productCatalog> element is inside a sequence of 3 elements. The first is <catalogName>, which contains normal text; the second is <expiryDate>, which contains text that conforms to the date rendering rules; the third is <products>, which contains a list of <product> elements.
Each <product> element is a complex type, so the document is defined later using <complexType>. This complex type (named <productType>) consists of a sequence of 3 elements containing product information. These elements hold text (<productName>), decimal numbers (<productPrice>), and boolean values (<inStock>). This complex type also includes a necessary attribute id.