Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 2: Existing users

Q 2.2: What does XML look like (inside)?

Pointy brackets like HTML

The basic structure of XML is similar to other applications of SGML, including HTML. The basic components can be seen in the following examples. An XML document starts with an optional Prolog, which can have two (optional) parts:

  1. The XML Declaration:

    <?xml version="1.0" encoding="utf-8"?>

    This specifies that this is an XML document and that it uses the UTF-8 character repertoire (the default; others are available but support is only mandated for UTF-8);

  2. A Document Type Declaration if you are using a DTD:

    <!DOCTYPE report SYSTEM "http://sales.acme.corp/dtds/salesrep.dtd">

    which identifies the type of document (here, ‘report’) and says where the Document Type Description (DTD) is stored;

The Prolog is followed by the Document Instance:

  1. A root element, which is the outermost (top level) element (start-tag plus end-tag) which encloses everything else: in the examples below the root elements are conversation and titlepage;

  2. A structured mix of descriptive or prescriptive elements enclosing the character data content (text), and optionally any attributes (‘name="value"’ pairs) inside some start-tags.

XML documents can be very simple, with straightforward nested markup of your own design:

<?xml version="1.0" standalone="yes"?>
  <greeting>Hello, world!</greeting>
  <response>Stop the planet, I want to get off!</response>

Or they can be more complicated, with a Schema or DTD, and maybe an internal subset (local DTD changes in [square brackets] within the Document Type Declaration like the ENTITY declaration below); and an arbitrarily complex nested structure:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE titlepage 
  SYSTEM "" 
[<!ENTITY % active.links "INCLUDE">]>
<titlepage xml:id="BG12273624">
  <white-space type="vertical" amount="36"/>
  <title font="Baskerville" alignment="centered" 
   size="24/30">Hello, world!</title>
  <white-space type="vertical" amount="12"/>
	  <!-- In some copies the following 
           decoration is hand-colored, presumably 
           by the author -->
  <image location="" 
   type="URI" alignment="centered"/>
  <white-space type="vertical" amount="24"/>
  <author font="Baskerville" size="18/22" 
   style="italic">Vitam capias</author>
  <white-space type="vertical" role="filler"/>

Or they can be anywhere between: a lot will depend on how you want to define your document type (or whose you use) and what it will be used for. Database-generated or program-generated XML documents used in e-commerce are usually unformatted because they are for machine consumption, not for human reading, and they may use very long names or values, with multiple redundancy and sometimes no character data content at all, just values in attributes:

<?xml version="1.0"?>
<ORDER-UPDATE AUTHMD5="4baf7d7cff5faa3ce67acf66ccda8248"