What Is XML?

Like HTML, XML (for eXtensible Markup Language) consists of data wrapped in tags. But, as the W3C Schools web site explains, whereas HTML is designed to format data in a predefined set of tags, XML doesn’t do anything; it merely encases data which is meant to be shared, in tags which are not predefined.

XML documents have these characteristics (courtesy W3C Schools):

  • XML documents must have a root element
  • XML elements must have a closing tag
  • XML tags are case sensitive
  • XML elements must be properly nested
  • XML attribute values must be quoted

For example, here’s an XML file that describes a breakfast menu.

<breakfast>
	<item name="waffles">
		<ingredient>ice cream</ingredient>
		<ingredient quantity="3">waffles</ingredient>
		<ingredient>syrup</ingredient>
	</item>
	<item name="bacon and eggs">
		<ingredient quantity="5">bacon</ingredient>
		<ingredient quantity="2"/>
	</item>
</breakfast>

Notice how the tags, breakfast, item, and ingredient have names we’ve made up; in this way, XML is self-descriptive. Notice, too, how each tag either has a matching end tag or, like the last ingredient, is self-contained; in any case, no tag is left unclosed.

The entire file is enclosed within one tag, <breakfast>. Within that are a number of <item> tags, and within those are <ingredient> tags. These are referred to as elements. Each element, in turn, can have named values within their start tags, like name and quantity; these are called attributes.

XML Schemas

XML files can also be based on schemas. Without getting too deeply into the subject, here’s the schema, in file breakfast.xsd, for our XML file:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="breakfast">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="item" minOccurs="0"
          maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="item">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="ingredient" minOccurs="1" maxOccurs="unbounded">
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="xs:string">
                <xs:attribute name="quantity" type="xs:integer"
                  use="optional" />
              </xs:extension>
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="name" type="xs:string"
        use="required" />
    </xs:complexType>
  </xs:element>
</xs:schema>

This schema file is itself based on a schema. Our schema’s outer tag refers to that schema that restricts the names, formats, and sequence of the tags within breakfast.xls:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

The xmlns attribute on the schema tag declares a namespace, consisting of an arbitrary prefix (xs) and the location of a schema. This establishes the schema http://www.w3.org/2001/XMLSchema as one that controls any tag in this schema (which is, after all, itself an XML file) whose name begins with “xs:”. That’s why all the tags in breakfast.xls start with that prefix.

It’s possible to refer to multiple namespaces within an XML file, and there may be conflicts between them. Using their prefixes lets you identify the schema that applies to any given tag.

Our schema, breakfast.xls, is invoked by changing the start of the XML file thus:

<?xml version="1.0" encoding="UTF-8"?>
<breakfast
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:noNamespaceSchemaLocation="breakfast.xsd">

Notice that XMLSchema-instance is still invoked, and again with the prefix “xsi”–and then “xsi” is used as a prefix for another prefix, noNamespaceSchemaLocation, defined in file breakfast.xsd. This name indicates that any tag without a prefix is in the same namespace as its parent; in this case, it means any tag in the breakfast.xml without a prefix is controlled by the same schema as the <breakfast> tag, which is breakfast.xsd. This is why the tags in breakfast.xml conform to breakfast.xsd even though they have no prefixes.

When an XML file is controlled by a schema, it cannot legitimately deviate from it. For example, breakfast.xsd allows for the optional quantity attribute on an <ingredient> tag and ensures that it’s numeric.

We’re going to explore two techniques for reading and writing XML files.

What You Need to Know

  • XML is a format for transfer of data.
  • XML files use tags, similar to HTML. But the tag names have no intrinsic meaning and are chosen totally at the discretion of designers and developers.
  • XML files are well formed; i.e., their formats conform to a set of rules.
  • XML schemas are one means of ensuring the layout of XML elements in a file. An XML file that conforms to its schema is said to be valid.
  • There are several frameworks for processing XML files.

Next up: Processing XML with the Document Object Model