<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE qandaset PUBLIC "+//Silmaril//DTD FAQ based on DocBook 4.4//EN//XML"
  "faq.dtd" [
<!ENTITY xmllogo PUBLIC "+//Silmaril//NONSGML XML Logo//EN" 
  "xmllogo.gif" NDATA GIF>
]>
<?PSGML nofill programlisting literal?>
<qandaset revisionflag="changed" revision="2007-08-01">
  <blockinfo>
    <titleabbrev>The XML FAQ</titleabbrev>
    <title>Frequently-Asked Questions about the Extensible Markup
      Language</title>
    <graphic entityref="xmllogo" format="GIF"/>
    <editor>
      <firstname>Peter</firstname>
      <surname>Flynn</surname>
      <affiliation>
	<orgname>Silmaril Consultants</orgname>
	<orgdiv>Textual Therapy Division</orgdiv>
      </affiliation>
      <email>http://silmaril.ie/cgi-bin/blog</email>
    </editor>
    <edition use="current.date" for="date"><conref
    use="current.version"/></edition>
    <authorgroup id="contributors" role="Contributors">
      <collab>
	<collabname>The following people helped with the original
	contributions, plus many other members of the W3C XML SIG as
	well as FAQ readers around the world.</collabname>
      </collab>
      <othercredit>
	<firstname>Terry</firstname>
	<surname>Allen</surname>
      </othercredit>
      <othercredit>
	<firstname>Tom</firstname>
	<surname>Borgman</surname>
      </othercredit>
      <othercredit>
	<firstname>Tim</firstname>
	<surname>Bray</surname>
      </othercredit>
      <othercredit>
	<firstname>Robin</firstname>
	<surname>Cover</surname>
      </othercredit>
      <othercredit>
	<firstname>Bob</firstname>
	<surname>DuCharme</surname>
      </othercredit>
      <othercredit>
	<firstname>Christopher</firstname>
	<surname>Maden</surname>
      </othercredit>
      <othercredit>
	<firstname>Eve</firstname>
	<surname>Maler</surname>
      </othercredit>
      <othercredit>
	<firstname>Makoto</firstname>
	<surname>Murata</surname>
      </othercredit>
      <othercredit>
	<firstname>Peter</firstname>
	<surname>Murray-Rust</surname>
      </othercredit>
      <othercredit>
	<firstname>Liam</firstname>
	<surname>Quin</surname>
      </othercredit>
      <othercredit>
	<firstname>Michael</firstname>
	<surname>Sperberg-McQueen</surname>
      </othercredit>
      <othercredit>
	<firstname>Joel</firstname>
	<surname>Weber</surname>
      </othercredit>
    </authorgroup>
    <keywordset>
      <keyword>xml</keyword>
      <keyword>sgml</keyword>
      <keyword>html</keyword>
      <keyword>markup</keyword>
      <keyword>structure</keyword>
      <keyword>xslt</keyword>
      <keyword>latex</keyword>
    </keywordset>
    <abstract id="abstract">
      <title>Summary</title>
      <para>This is the list of Frequently-Asked Questions about the
	Extensible Markup Language (XML). It has answers to most of
	the common questions people ask about XML. If you are seeking
	answers to questions about related areas such as HTML, SGML,
	CGI scripts, PHP, JSP, Java, databases, or penguins, you may
	find some pointers, but you should probably look elsewhere as
	well.</para>
      <para>The FAQ is intended as a first resource for users,
	authors, developers, and the interested reader. Details of its
	organisation, contributors, availability, translations, and
	revisions are in the Admin section. Updates to the FAQ are
	notified to the mailing lists and newsgroups listed in <link
	  linkend="discussions"></link>.</para>
      <para>The full document is available for download in many
	different formats: see <link
	  linkend="availability"></link> for a list.</para>
      <para>
	<note id="wtf">
	  <title>WTF</title>
	  <para><ulink url="http://seanmcgrath.blogspot.com">Seán
	      McGrath</ulink>&nbsp;<ulink
	      url="http://seanmcgrath.blogspot.com/#112988775713608464">suggested</ulink>: 
	    <quote>It would be great if FAQs had a WTF section to direct
	      the eyes of the exasperated to Q's with a high desperation
	      index <literal>:-)</literal></quote>, so here are the top
	    dozen most-wanted:</para>
	  <simplelist>
	    <member><link linkend="whatisxml"></link></member>
	    <member><link linkend="style"></link></member>
	    <member><link linkend="dtds"></link></member>
	    <member><link linkend="browsers"></link></member>
	    <member><link linkend="whatissgml"></link></member>
	    <member><link linkend="specials"></link></member>
	    <member><link linkend="markup"></link></member>
	    <member><link linkend="whatfor"></link></member>
	    <member><link linkend="software"></link></member>
	    <member><link linkend="schemas"></link></member>
	    <member><link linkend="namespaces"></link></member>
	    <member><link linkend="glossary"></link></member>
	  </simplelist>
	</note>
      </para>
      <section id="organisation">
	<title>Organisation</title>
	<para>This FAQ was originally maintained on behalf of the
	  World Wide Web Consortium's XML Special Interest Group. It
	  is divided into four sections: <link linkend="basics"
	    xreflabel="simple">Basics</link>, <link xreflabel="simple"
	    linkend="users">Users</link>, <link xreflabel="simple"
	    linkend="authors">Authors</link>, and <link
	    xreflabel="simple" linkend="developers">Developers</link>.
	  The questions are numbered independently within each
	  section. As the numbering may change with each version,
	  comments and suggestions should refer to the version number
	  (see <link linkend="revhist"></link>) as well as the section
	  and question number. See <link linkend="cite"></link> for
	  details of citation and reference.</para>
	<para>Please submit bug reports, suggestions for improvement,
	  and other comments about <emphasis>this FAQ only</emphasis>
	  to <ulink url="xmlfaq@silmaril.ie">the editor</ulink>.
	  Questions and comments about XML should go to the relevant
	  <link linkend="discussions" xreflabel="simple">mailing list or
	    newsgroup</link>. Comments about the <link xreflabel="simple"
	    linkend="spec">XML Specification</link> itself and related
	  specifications should be directed to the <ulink
	    url="http://www.w3.org/">W3C</ulink>.</para>
	<note id="updates">
	  <title>Updates</title>
	  <itemizedlist>
	    <listitem>
	      <para xreflabel="added"><emphasis>Additions</emphasis>
		since the last version are indicated with a plus
		sign.</para>
	    </listitem>
	    <listitem>
	      <para xreflabel="changed"><emphasis>Changes</emphasis>
		since the last version are indicated with a plus/minus
		sign.</para>
	    </listitem>
	    <listitem>
	      <para xreflabel="deleted"><emphasis>Deletions</emphasis>
		retained temporarily for information are indicated
		with a minus sign.</para>
	    </listitem>
	  </itemizedlist>
	</note>
      </section>
      <section id="availability">
	<title>Availability</title>
	<para>This XML document is at <ulink
	    url="http://xml.silmaril.ie/"></ulink>. It is XML served
	  converted to HTML by Saxon, so what you read online is
	  HTML in your browser.</para>
	<itemizedlist>
	  <listitem>
	    <para>You can <ulink
		url="http://xml.silmaril.ie/faq.sgml">download the
		unconverted file</ulink> (avoiding the
	      <filename>.xml</filename> filetype which
	      over-enthusiastic browsers want to usurp);</para>
	  </listitem>
	  <listitem>
	    <para>The <ulink
		url="http://xml.silmaril.ie/faq.dtd">DTD</ulink> is a
	      lightly modified version of <ulink
		url="http://www.docbook.org/">DocBook</ulink>;</para>
	  </listitem>
	  <listitem>
	    <para>There is a MindMap version available by clicking on
	      the MindMap link in the banner at the top of the page.
	      This is an XML format used by <ulink
		url="http://freemind.sourceforge.net/">FreeMind</ulink>.</para>
	  </listitem>
	  <listitem>
	    <para>There are XSL stylesheets for
	      <ulink url="http://xml.silmaril.ie/webfaq.xsl.tar.gz">the
		conversion to HTML</ulink> and for <ulink
		url="http://xml.silmaril.ie/printfaq.xsl">converting to
		<LaTeX/></ulink> to make the PDF and PostScript
	      versions;</para>
	  </listitem>
	  <listitem>
	    <para>The previous single-file HTML version
	      is at <ulink
		url="http://xml.silmaril.ie/faq.html"></ulink> and this
	      was used to save a copy in <ulink
		url="http://xml.silmaril.ie/faq.sxw">OpenOffice</ulink>, 
	      <ulink url="http://xml.silmaril.ie/faq.doc">Microsoft
		Word</ulink>, and <ulink
		url="http://xml.silmaril.ie/faq.txt">plaintext</ulink>
	      formats.</para>
	  </listitem>
	  <listitem>
	    <para>A notification of the new versions is posted
	      periodically to the <ulink url="comp.text.xml"
		type="news"></ulink> Usenet newsgroup and to the
	      <ulink
		url="http://listserv.heanet.ie/xml-l.html">XML-L</ulink> 
	      mailing list for the archives.</para>
	  </listitem>
	  <listitem>
	    <para>for printed copies there are versions for <ulink
		url="http://xml.silmaril.ie/faq_a4.ps">A4
		PostScript</ulink>, <ulink
		url="http://xml.silmaril.ie/faq_a4.pdf">A4 PDF</ulink>,
	      <ulink url="http://xml.silmaril.ie/faq_letter.ps">Letter
		PostScript</ulink> and <ulink
		url="http://xml.silmaril.ie/faq_letter.pdf">Letter
		PDF</ulink> available. Viewers can be downloaded for
	      <ulink
		url="http://www.cs.wisc.edu/~ghost/gsview/">PostScript</ulink> 
	      and PDF file formats (<ulink
		url="http://www.cs.wisc.edu/~ghost/gsview/">GhostView</ulink>, 
	      <ulink url="http://www.foolabs.com/xpdf/">Xpdf</ulink>,
	      <ulink
		url="http://www.adobe.com/products/acrobat/readstep.html">Adobe 
		Acrobat Reader</ulink>);</para>
	  </listitem>
	  <listitem>
	    <para>WAP (if anyone's still using it), OEB (eBook), and
	      cHTML versions have been proposed for your handheld
	      devices, and I'm open to offers if anyone wants to write
	      the code.</para>
	  </listitem>
	</itemizedlist>
	<para>The FAQ is also available in carbon-based toner on
	  flattened dead trees by sending &euro;10 (&dollar;15 or
	  equivalent in any convertible currency) to the <ulink
	    url="xmlfaq@silmaril.ie">editor</ulink> (email first to
	  check rates and postal address).</para>   
	<para>You can download the XML logo as a <ulink
	    url="http://xml.silmaril.ie/images/xmllogo.gif">GIF</ulink>, <ulink
	    url="http://xml.silmaril.ie/images/xmllogo.jpg">JPG</ulink>, or <ulink
	    url="http://xml.silmaril.ie/images/xmllogo.eps">EPS</ulink> file;
	  and an icon for your file system in <ulink
	    url="http://xml.silmaril.ie/images/xml.ico">ICO</ulink> (Microsoft
	  Windows), <ulink
	    url="http://xml.silmaril.ie/images/xml_folder_icon.sit.hqx">Mac</ulink>, 
	  or <ulink url="http://xml.silmaril.ie/images/xml.xpm">XPM</ulink> (X
	  Window system) format.</para>
      </section>
      <section id="translations" remap="langs languages">
	<title>Translations</title>
	<para>Those I know about are in:</para>
	<itemizedlist>
	  <listitem>
	    <para><ulink
		url="http://www.oreilly.de/xml/xml_faq_fragen.html">German</ulink> 
	      (partial translation of some questions) [<personname>
		<firstname>Karin</firstname>
		<surname>Driesen</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://www.senamirmir.com/xml/faq/xml_faq_amh.html">Amharic</ulink> 
	      [<personname>
		<firstname>Abass</firstname>
		<surname>Alamnehe</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://www.fxis.co.jp/DMS/sgml/cafe/library/etc/xmlfaq.html">Japanese</ulink> 
	      [<personname lang="jp">
		<firstname>Makoto</firstname>
		<surname>Murata</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://slug.ctv.es/~olea/sgml-esp/xfaq15.html">Spanish</ulink> 
	      (currently inaccessible) [<personname>
		<firstname>Jaime</firstname>
		<surname>Sagarduy</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://xml.t2000.co.kr/faq/index.html">Korean</ulink> 
	      (currently inaccessible). [<personname>
		<firstname>Kangchan</firstname>
		<surname>Lee</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://zxd.webjump.com/xml.html">Chinese</ulink>
	      (currently inaccessible) [Neko]. Also in <ulink
		url="http://weblab.crema.unimi.it/xmlzh/XML_FAQ.htm">Chinese</ulink> 
	      (also inaccessible) [<personname>
		<firstname>Jiang</firstname>
		<surname>Luqin</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://www.gutenberg.eu.org/pub/GUTenberg/publications/HTML/FAQXML/faqxml-fr.html">French</ulink> 
	      [<personname>
		<firstname>Jacques</firstname>
		<surname>André</surname>
	      </personname>];</para>
	  </listitem>
	  <listitem>
	    <para><ulink
		url="http://zvon.vscht.cz/ZvonHTML/Translations/xmlFAQ/front_all.html">Czech</ulink> 
	      [<personname>
		<firstname>Miloslav</firstname>
		<surname>Nic</surname>
	      </personname>].</para>
	  </listitem>
	</itemizedlist>
	<para>I would be grateful if the translators of those copies
	  which have become inaccessible would contact me with the new
	  URI.</para>
      </section>
    </abstract>
    <legalnotice id="legal" role="Legal Notice">
      <para>This document is joint copyright &copy; 1996&ndash;2005 by
	the Silmaril Consultants and editor and is released under the
	terms of the GNU Free Documentation License (see below).
	Quotations of the contributions of others remain copyright of
	the individual contributors. You may copy and distribute this
	document in any form provided you acknowledge this source and
	the individual (in the case of a contribution) [see <link
	  linkend="cite"></link> for how] and don't try to
	pretend you or someone other than the author wrote it. If you
	want to republish or reprint the FAQ in bulk, or copy all or
	part of it onto another web site, please ask the editor first
	to make sure you get the right edition, to make provision for
	periodic updating, and to ensure you use the correct legal
	wording.</para>
      <para><quote>Permission is granted to copy, distribute and/or
	  modify this document under the terms of the GNU Free
	  Documentation License, Version 1.2 or any later version
	  published by the Free Software Foundation; with no Invariant
	  Sections, no Front-Cover Texts, and no Back-Cover Texts. A
	  copy of the license is available <ulink
	    url="http://www.ctan.org/tex-archive/info/beginlatex/html/appendixD.html#gfdl">here</ulink>. You are allowed to
	  distribute, reproduce, and modify it without fee or further
	  requirement for consent subject to the conditions in <ulink
	    url="http://www.ctan.org/tex-archive/info/beginlatex/html/appendixD.html#gfdl-4">the section on Modifications</ulink>.</quote></para>
      <para>The editor and contributing authors assert their right to
	be identified as the editor and contributing authors of this
	document.</para>
      <para id="cite">For overall citations of this FAQ, use:</para>
      <blockquote>
	<para>Flynn, P (Ed.), <citetitle>The XML FAQ</citetitle>
	  v.<userinput><conref use="current.version"/></userinput>, Cork,
	  <userinput><conref use="current.date"/></userinput>,
	  <literal>http://xml.silmaril.ie/</literal></para>
      </blockquote>
      <para>In bibliographic referencing systems this would be
	something like this (using BIB<TeX/> as an example)</para>
      <blockquote>
	<programlisting>
@Booklet{xmlfaq,
  title =        {The XML FAQ},
  editor =       {Peter Flynn},
  howpublished = {Webpage},
  address =      {Cork},
  month =        {<userinput><conref use="current.date" format="month" start="6" length="2"/></userinput>},
  year =         <userinput><conref use="current.date" start="1" length="4"/></userinput>,
  edition =      {v<userinput><conref use="current.version"/></userinput>},
  url =          {http://xml.silmaril.ie/}
}</programlisting>
      </blockquote>
      <para id="citefrag">A suitable format for citing
	individually-authored fragments would be:</para>
      <blockquote>
	<para><userinput>AN Other</userinput>, <quote><userinput>Title
	    of question</userinput></quote>. In Flynn, P (Ed.),
	  <citetitle>The XML FAQ</citetitle>
	  v.<userinput><conref use="current.version"/></userinput>, Silmaril Consultants, Cork,
	  <userinput><conref use="current.date" format="month" start="6" length="2"/>&#x00A0;<conref use="current.date" start="1" length="4"/></userinput>,
	  <userinput>S#.q#</userinput>.
	  <literal>http://xml.silmaril.ie/<userinput>section</userinput>/<userinput>question</userinput>/</literal></para>
      </blockquote>
      <para>In bibliographic referencing systems this would be
	something like this (again using BIB<TeX/> as an example)</para>
      <blockquote>
	<programlisting>
@InCollection{xmlfaq,
  author =       {<userinput>AN Other</userinput>},
  title =        {<userinput>Title of question</userinput>},
  booktitle =    {The XML FAQ},
  publisher =    {Silmaril Consultants},
  month =        {<userinput><conref use="current.date" format="month" start="6" length="2"/></userinput>},
  year =         <userinput><conref use="current.date" start="1" length="4"/></userinput>,
  editor =       {Peter Flynn},
  volume =       {<userinput>section number</userinput>},
  number =       {<userinput>question number</userinput>},
  address =      {Cork},
  url =          {http://xml.silmaril.ie/<userinput>section</userinput>/<userinput>question</userinput>/},
  edition =      {v.<userinput><conref use="current.version"/></userinput>}
}</programlisting>
      </blockquote>
    </legalnotice>
    <revhistory id="revhist" role="Revision History">
      <revision>
	<revnumber>0.0</revnumber>
	<date>1996-12-27</date>
	<revremark>First test. Unpublished.</revremark>
      </revision>
      <revision>
	<revnumber>0.1</revnumber>
	<date>1997-01-31</date>
	<revremark>First draft. Sample questions devised by
	  participants.</revremark>
      </revision>
      <revision>
	<revnumber>0.2</revnumber>
	<date>1997-02-03</date>
	<revremark>Revised draft. Additional questions and
	  answers.</revremark>
      </revision>
      <revision>
	<revnumber>0.3</revnumber>
	<date>1997-02-17</date>
	<revremark>Extensive revision following comments from the
	  group. Changes to markup and organization.</revremark>
      </revision>
      <revision>
	<revnumber>0.4</revnumber>
	<date>1997-02-23</date>
	<revremark>Minor editorial changes</revremark>
      </revision>
      <revision>
	<revnumber>0.5</revnumber>
	<date>1997-04-01</date>
	<revremark>Added Multidoc Pro as SGML browser; question on XML
	  math; fixed ambiguity in explanation of NETs; added JUMBO;
	  ERB changes of March 26; more details of linking and tools;
	  adding element declaration minimisation to the forbidden
	  list.</revremark>
      </revision>
      <revision>
	<revnumber>1.0</revnumber>
	<date>1997-05-01</date>
	<revremark>Added reference to ToC and printed URIs; added
	  disclaimer at A6; combined old A11 with A5 to explain
	  SGML/XML/HTML; clarified explanation of XML not replacing
	  HTML at C1; added new course and conference at (new) A11;
	  clarified B1, C4, C8; added FPI server at C12; removed
	  examples in C13.</revremark>
      </revision>
      <revision>
	<revnumber>1.1</revnumber>
	<date>1997-10-01</date>
	<revremark>No more minimisation parameters in element
	  declarations; parsers must now pass all white-space to the
	  application; everything is now case-sensitive, including all
	  markup; a new proposal for stylesheets: XSL, which combines
	  DSSSL and CSS in an XML format; Java[Script] and and
	  metadata and their use in XML; updated list of software;
	  first XML book is published; new public mailing list
	  XML-L</revremark>
      </revision>
      <revision>
	<revnumber>1.2</revnumber>
	<date>1998-02-01</date>
	<revremark>Added a Mac icon (thanks to Martin Winter and
	  others); removed Draft from references to the spec; changed
	  revision colours; the RMD is gone: replaced references to it
	  with standalone; updated some broken URIs; [1.21] minor
	  edits to URIs and updates on translation; added XUA to
	  details of MIME types.</revremark>
      </revision>
      <revision>
	<revnumber>1.3</revnumber>
	<date>1998-06-01</date>
	<revremark>Removed the math plugin (Linux Netscape is broken
	  and refused to elide it); updated list of events (need
	  more); fixed some broken URIs; added Spanish and Korean
	  translations and the Annotated Spec; updated details of
	  MS/NS browser development; clarified the use of FPI vs
	  SysiD; updated link to Feb 10 Rec Spec; added pointers to
	  the SGML Decl for XML; updated references to XLink and
	  XPointer; corrected a reference to ancient Sumerian writing;
	  clarified the need for conversion of HTML DTDs to
	  XML.</revremark>
      </revision>
      <revision>
	<revnumber>1.4</revnumber>
	<date>1998-10-01</date>
	<revremark>Added maintainer's email address under
	  Availability; Added note about ISO representation and voting
	  on standards; added Greek translation; updated details of
	  conferences; changed the URI for the new SGML/XML Web Pages;
	  updated details of browsers; corrected reference to the SGML
	  omitted features from XML; updated details of converting
	  HTML to XML; added mention of comp.text.xml; extended the
	  questions on graphics and how to use XML with current
	  browsers; added questions on DOM, conformance testing, DTD
	  includes, SGML DTDs into XML, EDI; (1.41) corrected errors
	  in MIME types, URIs, SDD, and images.</revremark>
      </revision>
      <revision>
	<revnumber>1.5</revnumber>
	<date>1999-06-01</date>
	<revremark>Added new XML mailing lists in Italian and in
	  French; added details of developer resources in Chinese; two
	  more translations under way (Chinese and Czech); updated
	  links to the question on DTDs; added question on the use of
	  Java to generate and manage XML; added question on when to
	  use attributes and when to use element markup; added
	  question on the use of XML syntax to describe DTD data
	  (schemas); expanded on the explanation of the use of formal
	  language in the spec; added question on the difference
	  between XML and C++; separated information on XML versions
	  of HTML into a separate question.</revremark>
      </revision>
      <revision>
	<revnumber>1.6</revnumber>
	<date>2000-07-01</date>
	<revremark>Added French and Czech translations and a Finnish
	  mailing list, and reorganised the list of translations;
	  updated URIs for newsgroups; clarified reference to Unicode;
	  reworded question on terminology; added more links to the
	  question on conformance testing; corrected error in content
	  model example for mixed content; updates to the question on
	  stylesheets; Minor edits to the question on software; major
	  changes to the question on servers and media types; updated
	  question on XML Schemas; added new question on `executing'
	  XML `programs'; replaced the math example with one less
	  likely to distress the gentle susceptibilities of some
	  readers; added a new question on knowing SGML/HTML before
	  XML.</revremark>
      </revision>
      <revision>
	<revnumber>2.0</revnumber>
	<date>2001-06-01</date>
	<revremark>DTD changed from DocBook SGML to QAML XML; removed
	  query form due to abuse; most questions revised and in some
	  cases rewritten; updated references to new versions of
	  associated standards, recommendations, and working drafts;
	  added pointer to Jon Noring's Unicode test page and NIST's
	  XSLT/XPath test suite; updated Eve Maler's links to the DTD
	  for the spec; added warnings on speling and punk chew asian;
	  added question on namespaces; fixed bug in question on
	  stylesheets; inserted explanation of `document' vs `data'
	  software; added new mailing list on XSL:FO; updated Robin
	  Cover's URI throughout; updated the question on media types
	  for RFC 3023; Extended question of graphics to cover SVG.
	  For 2.01 there were minor typos, some updated links (to
	  recent versions of the standards, and in the section on More
	  Information), and a few wording changes. Thanks to James
	  Cummings for a very thorough proofread. Editing was done
	  using GNU Emacs and psgml-mode.</revremark>
      </revision>
      <revision>
	<revnumber>2.1</revnumber>
	<date>2002-01-01</date>
	<revremark>Added <link linkend="x-hum">Humanities mailing
	    list</link>; added more references for <link
	    linkend="dbarts">XML and databases</link>; added the
	  Namespaces FAQ; corrected some misunderstandings in <link
	    linkend="utf-16">character encodings</link>; changed the
	  editor's email address; added a new question on <link
	    linkend="rootelement">root elements</link>; updated the
	  <link linkend="linkspecs">XLink</link> to W3C
	  Recommendation; updated the <link linkend="whatissgml">SGML
	    FAQ address</link>; fixed some broken links; added
	  translations into <link linkend="translations">German</link>
	  and <link linkend="translations">Amharic</link>; minor
	  revisions to some wording. Editing this time was done in
	  <ulink url="http://www.epcedit.com">epcEdit 1.02</ulink>.
	  V2.11 includes new material on <link
	    linkend="browsers">expectations and XML browsers</link>,
	  the removal of a mailing list, and a few corrections to
	  typos and links. Thanks to Seán Cannon and Dave&and;Nikki
	  for debugging the CSS style-sheet.</revremark>
      </revision>
      <revision>
	<revnumber>3.0</revnumber>
	<date>2003-01-01</date>
	<revremark>Added information on <link linkend="officeapps">Office
	    Applications</link> including Corel, Microsoft, and Sun
	  (to keep alphabetical order :-); updated details of <link
	    linkend="moreinfo">conferences and training</link>; updated
	  <link  linkend="browsers">browser</link> details; reworded a
	  few ungainly sentences; removed some obsolete URIs (mostly
	  for <emphasis>nice idea</emphasis> sites which died);
	  changed the phrasing of the <link linkend="databases">question on
	    databases</link>; added details on how to do standalone
	  validation to <link linkend="parsers">the question on
	    parsing</link> (thanks to Bill Rayer); added question on
	  <link linkend="management">how to present XML to
	    management</link> (thanks to Tad McClellan); the questions
	  on APIs and the DOM have been subsumed into <link
	    linkend="software">the question on software</link>, which
	  has been extensively rewritten; added yet more explanation
	  to the <link linkend="characters">section on Unicode</link>;
	  3.01 fixes minor typos; 3.02 adds updated dates for 2004
	  events.</revremark>
      </revision>
      <revision>
	<revnumber>3.01</revnumber>
	<date>2004-01-01</date>
	<revremark>Minor typographic changes</revremark>
      </revision>
      <revision>
	<revnumber>3.02</revnumber>
	<date>2004-01-12</date>
	<revremark>Added updates for 2004 events</revremark>
      </revision>
      <revision>
	<revnumber>4.0</revnumber>
	<date>2005-01-01</date>
	<revremark>Went back to <ulink
	    url="http://www.docbook.org/">DocBook</ulink> markup using
	  <ulink
	    url="http://www.docbook.org/tdg/en/html/qandaset.html"><sgmltag>qandaset</sgmltag></ulink> 
	  instead of the QAML that has been used for the last two
	  major releases. Revised text in most sections for clarity in
	  wording, and recast some now-established explanatory
	  material into the past tense. Added new dates for 2005.
	  Added explicit references to the GNU FDL in the legal
	  section. Took the tip on types of XML out into <link
	    linkend="docdata">a new question</link>, and added new
	  questions on <link linkend="includes">file
	    inclusions</link> and <link linkend="cdata">the use of
	    CDATA Marked Sections</link>.</revremark>
      </revision>
      <revision>
	<revnumber>4.1</revnumber>
	<date>2005-05-15</date>
	<revremark>Revised structure and new stylesheet for new
	  location at <ulink url="http://xml.silmaril.ie/"></ulink>.
	  The four main sections remain, but the text is served in
	  separate questions and sections rather than one huge file
	  (the PDF remains as a single document, of course). Removed
	  references to the now-defunct Balise language, added a Tip
	  on editor selection and some notes on WYSIWYG XSL[T]
	  editing.</revremark>
      </revision>
      <revision>
	<revnumber>4.2</revnumber>
	<date>2005-07-01</date>
	<revremark>Added new <link linkend="rng-list">RNG mailing
	    list</link>, updated section on <link
	    linkend="schemas">Schemas</link>, added links to the <link
	    linkend="sgmldec">XML Declaration for SGML</link>.
	  Retagged personal names for recognition, and ID'd related
	  FAQs. Expanded question on Why XML. Added link to email a
	  page to someone. Added and expanded the tips on ways of
	  getting typeset output, eg <LaTeX/>. Added new section on
	  special characters.</revremark>
      </revision>
      <revision>
	<revnumber>4.3</revnumber>
	<date>2005-09-05</date>
	<revremark>Added the notes culled from failed searches as a
	  <link linkend="glossary">Glossary</link>; updated some URLs,
	  and added one for XQuery to <link linkend="databases">the question
	    on databases</link> (thanks, Liam); updated <link
	    linkend="whatfor"></link>, <link
	    linkend="internals"></link>, <link
	    linkend="parsers"></link>, and <link linkend="cdata">the
	    question on CDATA Sections</link>. Added a new <link
	    linkend="conditionals">question on Conditionals</link>.
	  Tightened up on the indexing for searches, including the
	  removal of enclosing quotes, and added a bunch more
	  metadata.</revremark>
      </revision>
      <revision>
	<revnumber>4.31</revnumber>
	<date>2005-09-09</date>
	<revremark>Added notes on <link
	linkend="pipelines">Pipelining</link> and <link
	linkend="attribs">Attributes</link>.</revremark>
      </revision>
      <revision>
	<revnumber>4.32</revnumber>
	<date>2005-09-10</date>
	<revremark>Added details of <sgmltag
	class="attribute">xml:id</sgmltag> to the <link
	linkend="attribs">note on Attributes</link>.</revremark>
      </revision>
      <revision>
	<revnumber>4.33</revnumber>
	<date>2005-09-12</date>
	<revremark>Added more keywords, and a tip to the <link
	    linkend="asp">note on asp.net</link>.</revremark>
      </revision>
      <revision>
	<revnumber>4.34</revnumber>
	<date>2005-10-01</date>
	<revremark>Split the question on CDATA into two: one for CDATA
	  per se, and one for other ways of handling embedded HTML.
	  Added some more keywords, and revised the questions <link
	    linkend="discussions"></link> and <link
	    linkend="programming"></link>. Fixed a minor date bug in
	  the search script.</revremark>
      </revision>
      <revision>
	<revnumber>4.35</revnumber>
	<date>2005-10-08</date>
	<revremark>Fixed some broken links and removed a couple of
	obsolete ones. Added a note about the BOM.</revremark>
      </revision>
      <revision>
	<revnumber>4.36</revnumber>
	<date>2005-10-16</date>
	<revremark>Updated dates of events in <link
	    linkend="moreinfo"></link>.</revremark>
      </revision>
      <revision>
	<revnumber>4.37</revnumber>
	<date>2005-10-31</date>
	<revremark>Removed ambiguities in <link
	    linkend="includes"></link>.</revremark>
      </revision>
      <revision>
	<revnumber>4.38</revnumber>
	<date>2005-11-01</date>
	<revremark>Added personal views on patent, copyright, and
	  intellectual property.</revremark>
      </revision>
      <revision>
	<revnumber>4.39</revnumber>
	<date>2005-12-01</date>
	<revremark>Refined some keywords, changed presentations of
	some examples, reworded a paragraph on treatment of space, and
	added details of assigning a Schema to an instance.</revremark>
      </revision>
      <revision>
	<revnumber>4.4</revnumber>
	<date>2006-01-01</date>
	<revremark>Minor grammatical edits, major changes to the
	  indexing and DC metadata. Added glossary entry on data
	  export to CSV and expanded the description of nodes and the
	  grove. Fixed elusive bug in RSS feed. Added contributor
	  names to search index.</revremark>
      </revision>
      <revision>
	<revnumber>4.41</revnumber>
	<date>2006-01-07</date>
	<revremark>Fixed a cross-referencing bug in generated content.</revremark>
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.5</revnumber>
	<date>2006-02-27</date>
	<revremark>Added more keywords taken from failed
	searches. Expanded on file URIs, the use of compiled DTDs,
	self-describing data, the boolean nature of parameter entity
	switches, how to get HTML features (forms, etc).</revremark>
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.51</revnumber>
	<date>2006-02-28</date>
	<revremark>Added explanation of xml:is, xml:space, and
	xml:lang. Added new question on how to read (open) an XML file
	you have been sent.</revremark>
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.52</revnumber>
	<date>2006-03-26</date>
	<revremark>Added more keywords and fixed a broken link to the
	  XSL FAQ.</revremark>
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.53</revnumber>
	<date>2006-04-12</date>
	<revremark>Updated details of XML for Safari, and added a
	curious new enquiry.</revremark>
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.54</revnumber>
	<date>2006-06-01</date>
	<revremark>Corrected an error in the description of
	xml-stylesheet. Added link targets to Quick Answers.</revremark> 
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.55</revnumber>
	<date>2007-08-01</date>
	<revremark>Updated events for 2007&ndash;2008. Updated details
	of ODF and OOXML. Added section on broken software. Revised
	handling of failed searches. </revremark> 
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber>4.56</revnumber>
	<date id="current.date">2007-08-08</date>
	<revremark>Added details and links for HTML5</revremark> 
      </revision>
    </revhistory>
    <revhistory>
      <revision>
	<revnumber id="current.version">4.57</revnumber>
	<date id="current.date">2009-07-27</date>
	<revremark>Updated internal code, no changes to text</revremark> 
      </revision>
    </revhistory>
  </blockinfo>
  <qandadiv id="basics" remap="FAQ-GENERAL, General">
    <title>Basics: general information about XML</title>
    <qandaentry id="whatisxml" remap="FAQ-ACRO, acro">
      <question>
	<formalpara>
	  <title>What is XML?</title>
	  <para>The Extensible Markup Language.</para>
	</formalpara>
      </question>
      <answer>
	<para>XML is the Extensible Markup Language. It improves the
	  functionality of the Web by letting you identify your
	  information in a more accurate, flexible, and adaptable
	  way.</para>
	<para>It is extensible because it is not a fixed format like
	  HTML (which is a single, predefined <firstterm
	    linkend="markup">markup language</firstterm>). Instead,
	  XML is actually a
	  <firstterm>metalanguage</firstterm>&mdash;a language for
	  describing other languages&mdash;which lets you design your
	  own markup languages for limitless different types of
	  documents. XML can do this because it's written in <link
	    linkend="whatissgml" xreflabel="simple">SGML</link>, the international
	  standard metalanguage for text document markup (ISO
	  8879).</para>
      </answer>
    </qandaentry>
    <qandaentry id="markup">
      <question>
	<formalpara>
	  <title>What is a markup language?</title>
	  <para>A way of describing what's what in a document.</para>
	</formalpara>
      </question>
      <answer>
	<para>A markup language is a set of words and symbols for
	  describing the identity of pieces of a document (for example
	  <quote>this is a paragraph</quote>, <quote>this is a
	    heading</quote>, <quote>this is a list</quote>,
	  <quote>this is the caption of this figure</quote>, etc).
	  Programs can use this with a stylesheet to create output for
	  screen, print, audio, video, Braille, etc.</para>
	<para>Some markup languages (eg those used in wordprocessors)
	  only describe appearances (<quote>this is italics</quote>,
	  <quote>this is bold</quote>), but this method can only be
	  used for display, and is not normally re-usable for anything
	  else.</para>
	<para>XML is sometimes referred to as
	  <quote>self-describing data</quote> because the names of the
	  markup elements should represent the type of content they
	  hold.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-DEF, def" id="whatfor" >
      <question>
	<formalpara>
	  <title>What is XML for (aka <quote>Where should I use
	      XML</quote>)?</title>
	  <para>XML is for identification, transmission, and storage.</para>
	</formalpara>
      </question>
      <answer remap="examples roles web real world used for uses usage
      technology technologies single sourceing e-publishing publishing
      ecommerce"> 
	<blockquote xreflabel="spec">
	  <para>Its goal is to enable generic SGML to be served,
	    received, and processed on the Web in the way that is now
	    possible with HTML. XML has been designed for ease of
	    implementation and for interoperability with both SGML and
	    HTML.</para>
	</blockquote>
	<para>Despite <ulink
	    url="http://www.oasis-open.org/cover/sgmlwww.html">early
	    attempts</ulink>, browsers never allowed other SGML, only
	  HTML (although there were <link linkend="panorama">plugins</link>), 
	  and they allowed it (even encouraged it) to be corrupted or
	  broken, which held development back for over a decade by
	  making it impossible to program for it reliably. XML fixes
	  that by making it compulsory to stick to the rules, and by
	  making the rules much simpler than SGML.</para>
	<para>But XML is not just for Web pages: in fact it's very
	    rarely used for Web pages on its own because browsers still don't
	    provide reliable support for formatting and transforming
	    it. Common uses for XML include:</para>
	<variablelist>
	  <varlistentry>
	    <term>Information identification</term>
	    <listitem>
	      <para>because you can define
	      your own markup, you can define meaningful names for all
	      your information items.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Information storage</term>
	    <listitem>
	      <para>because XML is portable and
	      non-proprietary, it can be used to store textual
	      information across any platform. Because it is backed by
	      an international standard, it will remain accessible and
	      processable as a data format.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Information structure</term>
	    <listitem>
	      <para>XML can therefore be used to
	      store and identify any kind of (hierarchical)
	      information structure, especially for long, deep, or
	      complex document sets or data sources, making it ideal
	      for an information-management back-end to serving the
	      Web. <emphasis>This</emphasis> is its most common Web
	      application, with a transformation system to serve it as
	      HTML until such time as browsers are able to handle XML
	      consistently.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Publishing</term>
	    <listitem>
	      <para>The original goal of XML as defined in the
	      quotation at the start of this section. Combining the
	      three previous topics (identity, storage, structure)
	      means it is possible to get all the benefits of robust
	      document management and control (with XML) and publish
	      to the Web (as HTML) as well as to paper (as PDF) and to
	      other formats (eg Braille, Audio, etc) from a single
	      source document by using the appropriate stylesheets.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Messaging and data transfer</term>
	    <listitem>
	      <para>XML is also very heavily used for
	      enclosing or encapsulating information in order to pass
	      it between different computing systems which would
	      otherwise be unable to communicate. By providing a
	      <foreignphrase>lingua franca</foreignphrase> for data
	      identity and structure, it provides a common
	      <wordasword>envelope</wordasword> for inter-process
	      communication (messaging).</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Web services</term>
	    <listitem>
	      <para>Building on all of these, as well as its use in
		browsers, machine-processable data can be exchanged
		between consenting systems, where before it was only
		comprehensible by humans (HTML). Weather services,
		e-commerce sites, blog newsfeeds, <link
		  linkend="ajax" xreflabel="simple">AJaX</link> sites, and thousands of
		other data-exchange services use XML for data
		management and transmission, and the web browser for
		display and interaction.</para>
	    </listitem>
	  </varlistentry>
	</variablelist>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-SGML, sgml" id="whatissgml">
      <question>
	<formalpara>
	  <title>What is SGML?</title>
	  <para>Standard Generalized Markup Language, ISO
	    8879:1986</para>
	</formalpara>
      </question>
      <answer>
	<para id="faq:SGML">SGML is the Standard Generalized Markup
	  Language (<ulink url="http://www.iso.ch/">ISO
	    8879:1986</ulink>), the international standard for
	  defining descriptions of the structure of different types of
	  electronic document. There is an SGML FAQ from <personname>
	    <firstname>David</firstname>
	    <surname>Megginson</surname>
	  </personname> at <ulink id="FAQ:sgml"
	    url="http://math.albany.edu:8800/hm/sgml/cts-faq.html"></ulink>; 
	  and <personname>
	    <firstname>Robin</firstname>
	    <surname>Cover</surname>
	  </personname>'s SGML Web pages are at <ulink
	    url="http://www.oasis-open.org/cover/general.html"></ulink>. 
	  For a little light relief, try <personname>
	    <firstname>Joe</firstname>
	    <surname>English</surname>
	  </personname>'s <quote>Not the SGML FAQ</quote> at <ulink
	  id="FAQ:not-sgml" 
	    url="http://www.flightlab.com/~joe/sgml/faq-not.txt"></ulink>.</para>
	<para>SGML is very large, powerful, and complex. It has been
	  in heavy industrial and commercial use for nearly two decades,
	  and there is a significant body of expertise and software to
	  go with it.</para>
	<para>XML is a lightweight cut-down version of SGML
	  which keeps enough of its functionality to make it useful
	  but removes all the optional features which made SGML too
	  complex to program for in a Web environment.</para>
	<note>
	  <para>ISO standards like SGML are governed by the
	    International Organization for Standardization in Geneva,
	    Switzerland, and voted into or out of existence by
	    representatives from every country's national standards
	    body.</para>
	  <para>If you have a query about an international standard,
	    you should contact your national standards body for the
	    name of your country's representative on the relevant ISO
	    committee or working group.</para>
	  <para>If you have a query about your country's
	    representation in Geneva or about the conduct of your
	    national standards body, you should contact the relevant
	    government department in your country, or speak to your
	    public representative.</para>
	  <para>The representation of countries at the ISO is not a
	    matter for this FAQ. Please do not submit queries to the
	    editor about how or why your country's ISO representatives
	    have or have not voted on a specific standard.</para>
	</note>
      </answer>
    </qandaentry>
    <qandaentry id="whatishtml" remap="FAQ-HTML, html"
    revisionflag="changed">
      <question>
	<formalpara>
	  <title>What is HTML?</title>
	  <para>HyperText Markup Language, RFC 1866, the language of
	    Web pages.</para>
	</formalpara>
      </question>
      <answer>
	<para>HTML is the <ulink
	    url="http://www.w3.org/MarkUp">HyperText Markup
	    Language</ulink> (<ulink
	    url="ftp://ftp.rfc-editor.org/in-notes/rfc1866.txt">RFC
	    1866</ulink>), which started as a small application of
	  <link linkend="whatissgml">SGML</link> for the Web,
	  originating with <ulink
	    url="http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/Achievements/WorldWideWeb/WWW-en.html"><personname>
	      <firstname>Tim</firstname>
	      <surname>Berners-Lee</surname>
	    </personname> at CERN</ulink> in 1989&ndash;90.</para>
	<para id="faq:HTML">It defines a very simple class of
	  report-style documents, with section headings, paragraphs,
	  lists, tables, and illustrations, with a few informational
	  elements, but very few presentational elements<xref
	    linkend="nopres" role="footnote"/>, plus some hypertext
	  and multimedia. See the question on <link
	    linkend="extendhtml" xreflabel="simple">extending HTML</link>. The current
	  recommendation is to use the XML version, <link
	    linkend="xhtml" xreflabel="simple">XHTML</link>. There is a HTML and XHTML
	  FAQ maintained by <personname>
	    <firstname>Steven</firstname>
	    <surname>Pemberton</surname>
	  </personname> at <ulink id="FAQ:html"
	    url="http://www.w3.org/MarkUp/2004/xhtml-faq"></ulink></para>
	<para id="faq:HTML5">Recent independent moves by some members
	  of the W3C have led to the development of a revision of HTML
	  currently referred to as HTML5. There is an <ulink
	    url="http://www.ibm.com/developerworks/library/x-html5/?ca=dgr-lnxw01NewHTML">explanation</ulink> 
	  from Elliotte Rusty Harold, and a <ulink id="FAQ:html5"
	    url="http://blog.whatwg.org/faq/">FAQ</ulink> from the
	  WhatWG.</para>
      </answer>
    </qandaentry>
    <qandaentry id="differences" remap="FAQ-SAME, same">
      <question>
	<formalpara>
	  <title>Aren't XML, SGML, and HTML all the same
	    thing?</title>
	  <para>No, SGML and XML are
	    metalanguages. HTML is an application of them.</para>
	</formalpara>
      </question>
      <answer remap="differences similarity similarities different
	    between xml sgml html compiled">
	<para>Not quite; <link linkend="whatissgml"
	    xreflabel="simple">SGML</link> is the mother tongue, and
	  has been used for describing thousands of different document
	  types in many fields of human activity, from transcriptions
	  of <ulink url="http://celt.ucc.ie/">ancient Irish
	    manuscripts</ulink> to the <ulink
	    url="http://web.deskbook.osd.mil/">technical documentation
	    for stealth bombers</ulink>, and from <ulink
	    url="http://www.hl7.org">patients' medical and clinical
	    records</ulink> to <ulink
	    url="http://www.tecno.com/smdl.htm">musical
	    notation</ulink>. SGML is very large and complex, however,
	  and probably overkill for most common office desktop
	  applications.</para>
	<para>XML is an abbreviated version of SGML, to make it easier
	  to use over the Web, easier for you to define your own
	  document types, and easier for programmers to write programs
	  to handle them. It omits all the complex and less-used
	  options of SGML in return for the benefits of being easier
	  to write applications for, easier to understand, and more
	  suited to delivery and interoperability over the Web. But it
	  is still SGML, and XML files may still be processed in the
	  same way as any other SGML file (see the question on <link
	    linkend="software" xreflabel="simple">XML
	    software</link>).</para>
	<para><link linkend="whatissgml"
	    xreflabel="simple">HTML</link> is just one of many SGML or
	  XML applications&mdash;the one most frequently used on the
	  Web.</para>
	<para>Technical readers will find it more useful to think of
	  XML as being SGML&minus;&minus; rather than HTML++.</para>
	<tip xreflabel="William Hammond">
	  <para>(in article
	    <literal><![CDATA[<i7ll1362ib.fsf@hilbert.math.albany.edu>]]></literal>)</para>
	  <para>SGML is a category of <wordasword>document
	      types</wordasword>, with a configurable shared syntax,
	    most of which (like classic HTML) cannot be compiled to
	    produce executable programs.  XML is a subcategory of SGML
	    with syntactic restrictions.  For example, with XML the
	    vocabulary of a document type is always case sensitive,
	    while with SGML it may be either case sensitive or case
	    insensitive.  So, for example, classic HTML is an SGML
	    document type, and XHTML+MathML is an XML document
	    type.</para>
	  <para>While some document types correspond to document
	    markup languages, other document types (like a CTAN catalog
	    entry) are just for structured data[...]</para>
	  <para>I doubt seriously, however, that a computer language
	    like C is in any reasonable sense equivalent to an SGML
	    document type.
	  </para>
	</tip>
	<para>(Ed: In respect of this last paragraph, see <link
	    linkend="execute"></link> and <link linkend="programming"></link>.)</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-OWNS, owns" id="responsible">
      <question>
	<formalpara>
	  <title>Who is responsible for XML?</title>
	  <para>The W3C</para>
	</formalpara>
      </question>
      <answer remap="function W3C windows microsoft office owns market">
	<para>XML is a project of the <ulink
	    url="http://www.w3.org/">World Wide Web Consortium
	    (W3C)</ulink>, and the development of the specification is
	  supervised by an XML Working Group. A Special Interest Group
	  of co-opted contributors and experts from various fields
	  contributed comments and reviews by email.</para>
	<para>XML is a public format: it is not a proprietary
	  development of any company, although the membership of the
	  WG and the SIG represented companies as well as research and
	  academic institutions. <link linkend="spec" xreflabel="simple">The v1.0
	    specification</link> was accepted by the W3C as a
	  Recommendation on Feb 10, 1998.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-IMPORT, import" id="important">
      <question>
	<formalpara>
	  <title>Why is XML such an important development?</title>
	  <para>It overcomes the inflexibility of HTML and the
	    complexity of SGML</para>
	</formalpara>
      </question>
      <answer remap="advantages using xml">
	<para>It removes two constraints which were holding back Web
	  developments:</para>
	<orderedlist>
	  <listitem>
	    <para>dependence on a single, inflexible document type
	    (<link linkend="whatishtml" xreflabel="simple">HTML</link>) which was being much
	    abused for tasks it was never designed for;</para>
	  </listitem>
	  <listitem>
	    <para>the complexity of full <link
	      linkend="whatissgml">SGML</link>, whose syntax allows many
	    powerful but hard-to-program options.</para>
	  </listitem>
	</orderedlist>
	    <para>XML allows the flexible development of user-defined
	  document types. It provides a robust, non-proprietary,
	  persistent, and verifiable file format for the storage and
	  transmission of text and data both on and off the Web; and
	  it removes the more complex options of SGML, making it
	  easier to program for.</para>
      </answer>
    </qandaentry>
    <qandaentry id="extendhtml" remap="FAQ-EXTEND, extend">
      <question>
	<formalpara>
	  <title>Why not just carry on extending HTML?</title>
	  <para>HTML is already too overburdened with proprietary
	    add-ons.</para>
	</formalpara>
      </question>
      <answer>
	<para><link linkend="whatishtml" xreflabel="simple">HTML</link>
	  was already overburdened with dozens of interesting but
	  incompatible inventions from different manufacturers,
	  because it provides only one way of describing your
	  information.</para>
	<para>XML allows groups of people or organizations to <link
	    linkend="owndoctype">create their own customized markup
	    applications</link> for exchanging information in their
	  domain (music, chemistry, electronics, hill-walking,
	  finance, surfing, petroleum geology, linguistics, cooking,
	  knitting, stellar cartography, history, engineering,
	  rabbit-keeping, <link
	    linkend="mathematics">mathematics</link>, <ulink
	    url="http://users.iclway.co.uk/mhkay/gedml/index.html">genealogy</ulink>, 
	  etc).</para>
	<para>HTML is now well beyond the limit
	  of its usefulness as a way of describing information, and
	  while it will continue to play an important role for the
	  content it currently represents, many new applications
	  require a more robust and flexible infrastructure.</para>
      </answer>
    </qandaentry>
    <qandaentry id="whyxml" remap="FAQ-WORD, word">
      <question>
	<formalpara>
	  <title>Why should I use XML? (aka <quote>What is XML
	      for?)</quote></title>
	  <para>It's a robust, durable, manipulable, and free format
	    for information identification, storage and
	    transfer.</para>
	</formalpara>
      </question>
      <answer>
	<para>Here are a few reasons for using XML (in no particular
	  order). Not all of these will apply to your own
	  requirements, and you may have additional reasons not
	  mentioned here (if so, please let the editor of the FAQ
	  know!).</para>
	<itemizedlist>
	  <listitem>
	    <para>XML can be used to describe and identify information
	      accurately and unambiguously, in a way that computers
	      can be programmed to <quote>understand</quote> your
	      information (well, at least manipulate as if they could
	      understand it).</para>
	  </listitem>
	  <listitem>
	    <para>XML allows documents which are all the same type to
	      be created and handled consistently and without
	      structural errors, because it provides a standardised
	      way of describing, controlling, or allowing/disallowing
	      particular types of document structure. [Note that this
	      has absolutely nothing whatever to do with formatting,
	      appearance, or the actual text or data content of your
	      documents, only the structure of them. If you want
	      styling or formatting, see <link
		linkend="style"></link>.]</para>
	  </listitem>
	  <listitem>
	    <para>XML provides a robust and durable format for
	      information storage and transmission. Robust because it
	      is based on a proven standard, and can thus be tested
	      and verified; durable (persistent) because it uses
	      plain-text file formats which will outlast proprietary
	      binary ones.</para>
	  </listitem>
	  <listitem>
	    <para>XML provides a common syntax for messaging systems
	      for the exchange of information between applications.
	      Previously, each messaging system had its own format and
	      all were different, which made inter-system messaging
	      unnecessarily messy, complex, and expensive. If everyone
	      uses the same syntax it makes writing these systems much
	      faster and more reliable.</para>
	  </listitem>
	  <listitem>
	    <para>XML is free. Not just free of charge (free as in
	      beer) but free of legal encumbrances (free as in
	      speech). It doesn't belong to anyone, so it can't be
	      hijacked or pirated. And you don't have to pay a fee to
	      use it (you can of course choose to use commercial
	      software to deal with it, for lots of good reasons, but
	      you don't pay for XML itself).</para>
	  </listitem>
	  <listitem>
	    <para>XML information can be manipulated programmatically
	      (under machine control), so XML documents can be pieced
	      together from disparate sources, or taken apart and
	      re-used in different ways. They can be converted into
	      any other format with no loss of information.</para>
	  </listitem>
	  <listitem>
	    <para>XML lets you separate form (appearance) from
	      content. Your XML file contains your document
	      information (text, data) and identifies its structure:
	      your formatting and other processing needs are
	      identified separately in a <link linkend="style"
	      xreflabel="simple">stylesheet</link> or processing
	      system. The two are combined at output time to apply the
	      required formatting to the text or data identified by
	      its structure (location, position, rank, order, or
	      whatever).</para>
	  </listitem>
	  <listitem>
	    <para>Any of the Design Goals listed in the <ulink
		url="http://www.w3.org/TR/2004/REC-xml-20040204/#sec-origin-goals">XML 
		Specification</ulink>.</para>
	  </listitem>
	</itemizedlist>
	<tip xreflabel="Peter Flynn">
	  <title>Why not just use Word or Notes?</title>
	  <para>Restricted proprietary data formats are unsuitable
	    for durable public information.</para>
	  <para>Information on a network which connects many different
	    types of computer has to be usable on all of them. Public
	    information in particular cannot afford to be restricted
	    to one make or model or manufacturer, or to cede control
	    of its data format to private hands. It is also helpful
	    for such information to be in a form that can be reused in
	    many different ways, as this will minimize wasted time and
	    effort. <ulink
	      url="http://epu.ucc.ie/doc/markup">Proprietary 
	      data formats</ulink>, no matter how well documented or
	    publicized, are simply not an option: their control still
	    resides in private hands and they can be changed or
	    withdrawn arbitrarily without notice.</para>
	  <para><link linkend="whatissgml" xreflabel="simple">SGML</link> is the
	    international standard for defining this kind of
	    application, and was therefore the natural choice for XML,
	    but those who need an alternative based on different
	    software for other purposes are entirely free to implement
	    similar services using such a system, especially if they
	    are for private use.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="moreinfo" remap="FAQ-HOWTO, FAQ-MORE, more"
    revisionflag="changed">
      <question>
	<formalpara>
	  <title>Where do I find more information about XML?</title>
	  <para>At http://xml.coverpages.org/</para>
	</formalpara>
      </question>
      <answer remap="documentation help">
	<para id="faq:XML-Condensed">Online, there's the <link linkend="spec" xreflabel="simple">XML
	    Specification</link> and the ancillary documentation
	  available from the <ulink
	    url="http://www.w3.org/">W3C</ulink>; Robin Cover's <ulink
	    url="http://xml.coverpages.org/">SGML/XML Web
	    pages</ulink> with an extensive list of online reference
	  material and links to software; and a <ulink
	    url="http://www.textuality.com/xml/">summary</ulink> and
	  <ulink id="FAQ:xml-condensed"
	    url="http://www.textuality.com/xml/faq.html">condensed
	    FAQ</ulink> from <personname>
	    <firstname>Tim</firstname>
	    <surname>Bray</surname>
	  </personname>; and thousands of reference
	  resources available by typing <quote>xml</quote> into Google
	  or other search engine.</para>
	<para>For offline resources, see the
	  lists of books, articles, and software for XML in
	    <personname>
	    <firstname>Robin</firstname>
	    <surname>Cover</surname>
	  </personname>'s <ulink
	    url="http://xml.coverpages.org/sgml-xml.html">SGML and XML
	    Web pages</ulink>. That site should always be your first
	  port of call.</para>
	<para>The events listed below are the
	  ones I have been told about. Please <ulink
	    url="xmlfaq@silmaril.ie">mail me</ulink> if you come
	  across others: there are many other XML events around the
	  world, and most of them are announced on the <link xreflabel="simple"
	    linkend="discussions">mailing lists and
	    newsgroups</link>.</para>
	<note id="events">
	  <title>Events</title>
	  <itemizedlist>
	    <listitem>
	      <para>The main XML Conferences are run in North America
		and Europe by <ulink
		  url="http://www.idealliance.org">IDEAlliance</ulink>
		(formerly the GCA). The 2007 <ulink
		  url="http://www.xmlconference.org/xmlusa/">US XML
		  Conference</ulink> will be in Boston on 3&ndash;5
		December (2008 to be announced). The 2007 <ulink
		  url="http://www.xtech-conference.org/">XTech
		  Conference</ulink> (formerly known as XML Europe)
		was on on 15&ndash;18 May in Paris.</para>
	    </listitem>
	    <listitem>
	      <para>The <ulink
		  url="http://www.extrememarkup.com/extreme/">Extreme
		  Markup Languages</ulink> conference (the principal
		  technical meeting) will be in Montréal on 7&ndash;10
		August 2007.</para>
	    </listitem>
	    <listitem>
	      <para id="summer">The 2008 annual
		<ulink url="http://www.xmlsummerschool.com/">XML
		  Summer School</ulink>, organised by <ulink
		  url="http://www.csw.co.uk">CSW</ulink>, takes place
		in Wadham College, Oxford on 27 July&ndash;1 August 
		(<ulink
		  url="http://www.xmlsummerschool.com/competition.htm">win 
		  a place!</ulink>).</para>
	    </listitem>
<!--
	    <listitem>
	      <para>Another European conference this year is <ulink
		  url="http://www.xmlprague.cz/">XML Prague
		  2006</ulink> on June 17&ndash;18 in the Czech
		Republic (this year including the <ulink
		  url="http://exist.sourceforge.net">eXist Open Source
		  XML database</ulink> community workshop on June
		18th).</para>
	    </listitem>
-->
	  </itemizedlist>
	</note>
      </answer>
    </qandaentry>
    <qandaentry id="discussions" remap="FAQ-MAILINGLIST, mailinglist"
    >
      <question>
	<formalpara>
	  <title>Where can I discuss implementation and development of
	    XML?</title>
	  <para>On mailing lists, Usenet newsgroups, web-based
	    bulletin-boards, and IRC channels</para> 
	</formalpara>
      </question>
      <answer remap="forums">
	<para>The two principal online support media are Usenet
	  newsgroups and mailing lists. The IRC network is also used
	  to some extent, and most individual projects and programs
	  have their own topic-specific bulletin-boards on their web sites.</para>
	<para>For off-line support, see <link
	    linkend="moreinfo"></link> for details of
	  conferences and summerschools.</para>
	<itemizedlist>
	  <listitem>
	    <para>The Usenet newsgroups are <ulink type="news"
		url="comp.text.xml">comp.text.xml</ulink> and to a
	      certain extent <ulink type="news"
		url="comp.text.sgml">comp.text.sgml</ulink>. Ask your
	      Internet Provider for access to these, or use a Web
	      interface like <ulink
		url="http://groups.google.com/">Google Groups</ulink>.
	      If your browser or mailer doesn't provide newsreading
	      facilities, install one that does, or (better) use a
	      standalone newsreader.</para>
	  </listitem>
	  <listitem>
	    <para>The general-purpose mailing list for public
	      discussion is <ulink
		url="http://listserv.heanet.ie/xml-l.html">XML-L</ulink>: 
	      to subscribe, visit <ulink
		url="https://listserv.heanet.ie/cgi-bin/wa?SUBED1=xml-l&amp;A=1">the 
		Web site</ulink> and click on the link to join.</para>
	  </listitem>
	  <listitem>
	    <para>For those developing software components for XML
	      there is the <ulink
		url="http://lists.xml.org/archives/xml-dev/">xml-dev
		mailing list</ulink>. You can subscribe by sending a
	      1&ndash;line mail message to <ulink
		url="xml-dev-request@lists.xml.org"></ulink> saying
	      just <literal>SUBSCRIBE</literal>. Note that this list is for
	      those people actively involved in developing  resources
	      for XML. It is not for general information about XML
	      (use the XML-L list above for that).</para>
	  </listitem>
	  <listitem>
	    <para>The XSL-List is for for discussing XSL (both XSLT
	      and XSL:FO). For details of how to subscribe, see <ulink
		url="http://www.mulberrytech.com/xsl/xsl-list"></ulink>.</para>
	    <tip xreflabel="Andrew Watt">
	      <para>There is a mailing list specifically for <ulink
		  url="http://www.egroups.com/group/XSL-FO">XSL-FO
		  only, on eGroups.com</ulink>. You can subscribe by
		sending a message to <ulink
		  url="XSL-FO-subscribe@egroups.com"></ulink>.</para>
	    </tip>
	    <warning>
	      <para>Be aware that the Yahoo E-Groups XSL-FO list sends out
		regular automated spam to non-members falsely claiming
		that they have asked to join.</para>
	    </warning>
	  </listitem>
	</itemizedlist>
	<tip xreflabel="Gianni Rubagotti">
	  <para>A new Italian mailing list about XML is born: to
	    subscribe, send a mail message without a subject line but
	    with text saying
	    <literal>subscribe
	    XML-IT</literal> to <ulink
	      url="majordomo@ananas.usr.dsi.unimi.it"></ulink>.
	    Everyone, Italian or not, who wants to debate about XML in
	    our tongue is welcome.</para>
	  <para id="x-hum">Gianni also runs the <ulink
	      url="http://groups.yahoo.com/group/x-humanities/">Humanities 
	      XML List</ulink>.</para>
	</tip>
	<tip xreflabel="J-P Theberge">
	  <para>A French mailing list about XML has been created. To
	    subscribe, send
	    <literal>subscribe</literal> to <ulink
	      url="xml-request@trisome.com"></ulink>.</para>
	</tip>
	<tip id="rng-list" xreflabel="Murata Makoto" lang="jp">
	  <para>Please mention this mailing list to your colleagues
	    who use RELAX NG. Go to: <ulink
	      url="http://groups.yahoo.com/group/rng-users/"></ulink>.</para>
	</tip>
	<note>
	  <title>Mailing lists</title>
	  <para>When you join a mailing list you will be sent details
	    of how to use it.  Please Read The Fine Documentation
	    because it  contains important information, particularly
	    about what to do if your company or ISP changes your email
	    address.</para>
	  <para>Please note that there is a lot of inaccurate and
	    misleading information published in print and on the Web
	    about subscribing to and unsubscribing from mailing lists.
	    Don't guess: Read The Fine Documentation.</para>
	</note>
      </answer>
    </qandaentry>
    <qandaentry id="programming" remap="langs javascript cobol pl/1 pl/i
	  pascal perl python ruby tcl/tk ppl differences" >
      <question>
	<formalpara>
	  <title>What is the difference between XML and C or
	    C++ or Java?</title>
	  <para>C and Java are for writing programs; XML is for
	    storing text.</para>
	</formalpara>
      </question>
      <answer>
	<para>C and C++ (and other languages like FORTRAN, or Pascal,
	  or Visual Basic, or Java or hundreds more) are
	  <emphasis>programming languages</emphasis> with which you
	  specify calculations, actions, and decisions to be carried
	  out in order:</para>
	<programlisting><![CDATA[
mod curconfig[if left(date,6) = "01-Apr", 
    t.put "April Fool!", 
    f.put days('31102005','DDMMYYYY') -
          days(sdate,'DDMMYYYY')
    " more shopping days to Samhain"];
	  ]]></programlisting>
	<para>XML is a markup specification language with which you
	  can design ways of describing information (text or data),
	  usually for storage, transmission, or processing by a
	  program. It says nothing about what you should do with the
	  data (although your choice of element names may hint at what
	  they are for):</para> 
	<programlisting><![CDATA[
<part num="DA42" models="LS AR DF HG KJ" 
 update="2001-11-22">
  <name>Camshaft end bearing retention circlip</name>
  <image drawing="RR98-dh37" type="SVG" x="476" 
   y="226"/>
  <maker id="RQ778">Ringtown Fasteners Ltd</maker>
  <notes>Angle-nosed insertion tool <tool 
         id="GH25"/> is required for the removal 
         and replacement of this part.</notes>
</part>
	  ]]></programlisting>
	<para>On its own, an SGML or XML file (including HTML) doesn't
	  do anything. It's a data format which just sits there until
	  you run a program which does something
	  <emphasis>with</emphasis> it. See also the question about
	  <link linkend="execute" xreflabel="simple">how to run or
	    execute XML files</link>.</para>
      </answer>
    </qandaentry>
  </qandadiv>
  <qandadiv remap="FAQ-USER, User" id="users">
    <title>Existing users (including everyone who uses a
      browser)</title>
    <qandaentry remap="FAQ-USEXML, usexml" id="usexml">
      <question>
	<formalpara>
	  <title>What do I have to do to use XML?</title>
	  <para>To read it: an XML browser (eg Firefox or IE). To
	    create: an XML editor (Emacs, Spy, etc).</para>
	</formalpara>
      </question>
      <answer>
	<para>For the average user of the Web, nothing except use a
	  browser which works with XML (see the <link
	    linkend="browsers">question about browsers</link>).
	  Remember some XML components are still being invented or
	  implemented (see the <ulink
	    url="http://www.w3.org/">W3C</ulink> web site), so some
	  features are still either undefined or have yet to be
	  written.</para>
	<para>You can use XML-conformant browsers to look at some of
	  the stable XML material, such as <ulink
	    url="ftp://sunsite.unc.edu/pub/sun-info/standards/xml/eg/">Jon 
	    Bosak's Shakespeare plays</ulink> and the molecular
	  experiments of the <ulink
	    url="http://www.xml-cml.org">Chemical Markup Language
	    (CML)</ulink>. There are some more example sources listed
	  at <ulink
	    url="http://xml.coverpages.org/xml.html#examples"></ulink>, 
	  and you will find XML (particularly in the guise of <link
	    linkend="xhtml">XHTML</link>) being introduced in places
	  where it won't break older browsers.</para>
	<para>If you want to start preparations for creating your own
	  XML files, see the questions in the <link
	    linkend="authors">Authors' Section</link> and the <link
	    linkend="developers">Developers' Section</link>.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-XMLOFFER, xmloffer" id="xmlhtml">
      <question>
	<formalpara>
	  <title>Should I use XML instead of HTML?</title>
	  <para>Yes if you need robustness, accuracy, and
	    persistence.</para>
	</formalpara>
      </question>
      <answer remap="cascading style sheets">
	<para>Yes, if you need robustness, accuracy, and persistence.
	  XML allows authors and providers to <link
	    linkend="owndoctype" xreflabel="simple">design their own
	    document markup</link> instead of being limited by HTML.
	  Document types can be explicitly tailored to an application,
	  so the cumbersome fudging and poodlefaking that has to take
	  place with <link linkend="whatishtml"
	    xreflabel="simple">HTML</link> becomes a thing of the
	  past: your markup can always say what it means. Trivial
	  example:</para>
	  <programlisting><![CDATA[
<date YYYY-MM-DD="2005-12-26">next Monday</date>
          ]]></programlisting>
	<itemizedlist>
	  <listitem>
	    <para>Information content can be richer and easier to use,
	      because the descriptive and <link
		linkend="links">hypertext linking abilities of
		XML</link> are much greater than those available in
	      HTML.</para>
	  </listitem>
	  <listitem>
	    <para>XML can provide more and better facilities for
	      browser presentation and performance, using XSLT and CSS
	      stylesheets;</para>
	  </listitem>
	  <listitem>
	    <para>It removes many of the underlying complexities of
	      SGML-format HTML (which led to them being ignored and
	      broken) in favor of a more flexible model, so writing
	      programs to handle XML is much easier than doing the
	      same for all the old broken HTML.</para>
	  </listitem>
	  <listitem>
	    <para>Information becomes more accessible and reusable,
	      because the more flexible markup of XML can be used by
	      any XML software instead of being restricted to specific
	      manufacturers as has become the case with HTML.</para>
	  </listitem>
	  <listitem>
	    <para>XML files can be used outside the Web as well, in
	      existing document-handling environments (eg
	      publishing).</para>
	  </listitem>
	</itemizedlist>
	<para>If your information is transient,
	  or completely static <emphasis>and</emphasis> unreferenced,
	  or very short and simple, and unlikely to need updating,
	  HTML may be all you need.</para>
      </answer>
    </qandaentry>
    <qandaentry id="readxml">
      <question>
	<formalpara>
	  <title>Someone sent me an XML file. How do I read it?</title>
	  <para>Open it in an XML browser or XML editor.</para>
	</formalpara>
      </question>
      <answer remap="reading opening">
	<para>If the file is well-formed or valid XML, you can just
	  open it with any XML-conformant browser (see <link
	    linkend="browsers"></link>). This will display the file in
	  an unformatted view, showing all the markup in a format that
	  lets you fold up or unfold the nested hierarchy (click on
	  the little plu and minus symbols), which will at least let
	  you read something.</para>
	<para>If the file contains a link to an XSLT or CSS stylesheet
	  (and the stylesheet was provided or is web-accessible) then
	  the browser should format the file in a readable manner (but
	  beware that in-browser formatting is not robust).</para>
	<para>If you want to edit the file, you need an XML editor
	  (see <link linkend="editors"></link>). Unless you are very
	  skilled with pointy-bracket markup, do
	  <emphasis>not</emphasis> try to edit XML files with non-XML
	  editors.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-BROWSER, browser" id="browsers">
      <question>
	<formalpara>
	  <title>Where can I get an XML browser?</title>
	  <para>MSIE 5.5 or 6.*; Mozilla Firefox 0.9.6 up</para>
	</formalpara>
      </question>
      <answer remap="brousers web browsers compatible difference
	  cascading style sheets ie">
	<para>Current state of existing browser support for XML (1 January
	  2006):</para>
	<itemizedlist>
	  <listitem>
	    <para id="faq:MSXML">Microsoft Internet Explorer
	      5.0 and 5.5 handled XML, processing it by default using a
	      built-in stylesheet written in a Microsoft-specific,
	      obsolete predecessor of XSLT called XSL (not to be
	      confused with the real XSLT). The output of the
	      stylesheet is DHTML, which, when rendered in the
	      browser, shows a coloured, syntax-highlighted version of
	      the XML document, with collapsible views. If the XML
	      document references a stylesheet, that stylesheet will
	      be used instead, within the limitations of MSIE's
	      incomplete implementation of CSS. MSIE 5.0 and 5.5 can
	      also use stylesheets in another obsolete syntax called
	      WD-xsl, which should be avoided. These versions can be
	      upgraded to support real XSLT: see the <ulink id="FAQ:msxml"
		url="http://www.netcrucible.com/xslt/msxml-faq.htm">MSXML 
		FAQ</ulink>.</para>
	    <para>MSIE 6.0 and later use real XSLT 1.0, but
	      can use both the obsolete syntaxes as well.</para>
	  </listitem>
	  <listitem>
	    <para>Mozilla <ulink
		url="http://www.mozilla.org/">Firefox</ulink> 0.9 up,
	      Netscape&nbsp;6 and 7 (there is no Netscape&nbsp;5), and Galeon
	      all have full XML support with XSLT and CSS. In
	      general, Firefox is more robust than MSIE, and provides
	      better standards adherence.</para>
	    <para>I have a user report that Netscape 4.6 and 4.8 supports XML,
		but no independent verification.</para>
	  </listitem>
	  <listitem>
	    <para>The authors of the former
	      MultiDoc Pro SGML browser, <ulink
		url="http://www.citec.fi/">CITEC</ulink> (whose engine
	      was also used in Panorama and other browsers),
	      joined forces with Mozilla to produce a multi-everything
	      browser called DocZilla, which reads HTML, XML, and
	      SGML, with XSLT and CSS stylesheets. This runs under
	      Windows and Linux and is currently at release 1.0. See
	      <ulink url="http://www.doczilla.com"></ulink> for
	      details. This is by far the most ambitious browser
	      project, and is backed by very solid markup-handling
	      expertise.</para>
	  </listitem>
	  <listitem>
	    <para>Contrary to earlier reports, <ulink
		url="http://www.opera.com/opera5/specs.html">Opera</ulink> 
	      does <emphasis>not</emphasis> appear to support XML. The
	      browser size is tiny by comparison with the others, but HTML/CSS
	      features are good and the speed is excellent, although
	      the earlier slavish insistence on mimicking everything
	      old (pre-Mozilla) Netscape did, especially the bugs,
	      still shows through in places.</para>
	  </listitem>
	  <listitem>
	    <para>Don't use Netscape 4.* or Internet Explorer 4.* or
	      earlier, or early versions of Mozilla if you want XML
	      support: they don't have it (unless the report above on
	      Netscape 4.6/8 is correct). Upgrade to <ulink
		url="http://www.mozilla.org/">Firefox</ulink> as soon
	      as possible.</para>
	  </listitem>
	</itemizedlist>
	<para>I have less information on the
	  XML capabilities of the new (OS/X) Mac browser (Safari),
	  which is based on the KHTML engine used in Konqueror.
	  Konqueror itself does not appear to support XML or XSLT (at
	  least in KDE under Fedora Core 4, for example), but Safari
	  1.3.2 (v312.6) under OS 10.3 does provide partial support
	  for XML, but does not honour an external DTD modified by an
	  internal subset (thanks to <personname>
	    <firstname>John</firstname>
	    <surname>Haynie</surname>
	  </personname> for testing this).</para>
	<tip xreflabel="Mike Brown">
	  <para>The concept of <quote>browsing</quote> is primarily
	    the result of HTML having the semantics that it does. In
	    an HTML document there are sections of text called anchors
	    that are <quote>hyperlinked</quote> to other documents
	    that might be at remote locations on a network or
	    filesystem. HTML documents provide cues to a web browser
	    regarding how the document should be displayed and what
	    kind of behaviors are expected of the browser when the
	    user interacts with it. The HTML specification provides
	    many suggestions and requirements for the browser, and
	    provides specific meanings for many different examples of
	    markup, such as the fact that an
	    <programlisting><![CDATA[<img>]]></programlisting> element
	    refers to an image that should be retrieved by the browser
	    and rendered inline with the adjacent text.</para>
	  <para>Unlike HTML, XML does not have such inherent semantics
	    at all. There is no prescribed method for rendering XML
	    documents. Therefore, what it means to
	    <quote>browse</quote> XML is open to interpretation. For
	    example, an XML document describing the characteristics of
	    a machine part does not carry any information about how
	    that information should be presented to a user. An
	    application is free to use the data to produce an image of
	    the part, generate a formatted text listing of the
	    information, display the XML document's markup with a
	    pretty color scheme, or restructure the data into a format
	    for storage in a database, transmission over a network, or
	    input to another program.</para>
	  <para>However, despite the fact that XML documents are
	    purely descriptive data files, it is possible to
	    <quote>browse</quote> them in a sense, by rendering them
	    with stylesheets. A stylesheet is a separate document that
	    provides hints and algorithms for rendering or
	    transforming the data in the XML document. HTML users may
	    be familiar with Cascading Style Sheets (CSS). The CSS
	    stylesheet language is general and powerful enough to be
	    applied to XML documents, although it is oriented toward
	    visual rendering of the document and does not allow for
	    complex processing of the document's data. By associating
	    an XML document with a CSS stylesheet, it may be possible
	    to load an XML document in a CSS-aware web browser, and
	    the browser may be able to provide some kind of rendering
	    of it, even if the browser does not otherwise know how to
	    read and process XML documents. However, not all web
	    browsers will load an XML document correctly, and they are
	    not required to recognize the XML markup that associates
	    the document with a stylesheet, so one cannot assume that
	    XML documents can be opened with just any web
	    browser.</para>
	  <para>A more complex and powerful <link
	      linkend="style">stylesheet language</link> is XSLT, the
	    Transformations part of the Extensible Stylesheet
	    Language, which can be used to transform XML to other
	    formats, including HTML, other forms of XML, and plain
	    text.  If the output of this transformation is HTML, it
	    can be viewed in a web browser as any other HTML document
	    would.</para>
	  <para>The degree of support for XML and stylesheets in web
	    browsers varies greatly. Although loading and rendering
	    XML in the browser is possible in some cases, it is not
	    universally supported. Therefore, much XML content on the
	    web is translated to HTML on the servers. It is this
	    generated HTML that is delivered to the browsers. Most of
	    <ulink url="http://www.microsoft.com">Microsoft</ulink>'s
	    web site, for example, exists as XML that is converted to
	    HTML on the fly. The web browser never knows the
	    difference.</para>
	</tip>
	<para>See also the notes on <link linkend="software"
	    xreflabel="simple">software for authors</link> and <link
	    linkend="developers" xreflabel="simple">XML for
	    developers</link>, and the more detailed list on the XML
	  pages in the SGML Web site at  <ulink
	    url="http://xml.coverpages.org/"></ulink>.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-SWITCH, switch" id="switchover">
      <question>
	<formalpara>
	  <title>Do I have to switch from SGML or HTML to XML?</title>
	  <para>Not if you don't want to</para>
	</formalpara>
      </question>
      <answer>
	<para>No, existing SGML and HTML applications software will
	  continue to work with existing files. But as with any
	  enhanced facility, if you want to view or download and use
	  XML files, you will need to use XML-aware software. There is
	  much more being developed for XML than there ever was for
	  SGML, so a lot of users are moving.</para>
	<para>However, for some static SGML applications (eg large
	  document archives) with well-established and stable
	  software, a good case can be made for <quote>not fixing it
	    if it ain't bust</quote>, and deferring a move to XML
	  until an appropriate time comes for a revision of the
	  service or features.</para>
      </answer>
    </qandaentry>
    <qandaentry id="officeapps" revisionflag="changed">
      <question>
	<formalpara>
	  <title>Can I use XML for ordinary office
	    applications?</title>
	  <para>Yes, use Star Office, Open Office, WordPerfect, or
	    even MS-Office (11/XP only).</para>
	</formalpara>
      </question>
      <answer>
	<para>Yes, most office productivity suites already do this,
	  and there are more on the way:</para>
	<itemizedlist>
	  <listitem id="odf">
	    <para><ulink
		url="http://www.openoffice.org/">OpenOffice</ulink>
	      has been saving its files as XML by default for a
	      several years.  The package comprise a wordprocessor,
	      spreadsheet, presentation software, and a vector drawing
	      package, and they share the same DTD/Schema. The Office
	      Document Format (ODF) is now the official International
	      Standard (ISO/IEC 26300) for office documents.</para>
	  </listitem>
	  <listitem>
	    <para>Corel's <ulink
		url="http://www.corel.com/servlet/Satellite?pagename=Corel2/Products/Home&amp;pid=1047022958453">WordPerfect</ulink> 
	      suite has shipped with a fully-fledged XML editor for
	      many years (which also does full SGML as well). It
	      can save the formatted output as a Microsoft Word
	      <filename>.doc</filename> file, but it uses its own
	      stylesheet technology to format documents, not XSLT or
	      CSS. It can also save its own (WordPerfect) document
		format to an XML representation.</para>
	  </listitem>
	  <listitem>
	    <para>The <ulink
		url="http://www.abisource.com/">AbiWord</ulink>
	      wordprocessor (all platforms) can open Word and
	      OpenOffice documents and save them in DocBook XML
	      format, although it does not provide native XML
	      editing.</para>
	  </listitem>
	  <listitem>
	    <para>Microsoft 
		Office 2003 provides a
	      <quote>Save As&hellip;XML</quote> to all parts of the
	      suite except Powerpoint, using WordML to represent the
		visual appearance of the document, although it will
		preserve style names if they are in use.</para>
	    <para>Word 2007 saves natively as XML, using Office Open
		XML (similar to WordML but not identical) which is
	      Microsoft's equivalent to <link
		linkend="odf">ODF</link>, which they are attempting to
		have recognised as a parallel international standard.</para>
	    <para>Word contains a real XML editor as well, supporting
	      other W3C Schemas as well as its own (but not DTDs), and
	      this also provides a method for binding element types to
	      Word's named styles (like Microsoft's earlier product
	      <ulink
		url="http://www.microsoft.com/catalog/display.asp?subid=38&amp;site=723">SGML 
		Author for Word</ulink> did).</para>
	  </listitem>
	  <listitem>
	    <para>Avoid Microsoft's <quote>Works</quote> package, as
	      it is incompatible both with Office and with XML.</para>
	  </listitem>
	  <listitem>
	    <para>I have no information on Lotus office products.</para>
	  </listitem>
	</itemizedlist>
	<para>There is more detail under <quote><ulink
	    url="http://xml.coverpages.org/xmlFileFormats.html">XML
	    File Formats for Office Documents</ulink></quote> in the XML Cover
	  Pages which briefly describes and points to further
	  information on: GNOME Office, KOffice, Microsoft XDocs,
	  OASIS TC for Open Office XML File Format, 1DOK.org Project,
	  and OpenOffice.org XML File Format.</para>
      </answer>
    </qandaentry>
  </qandadiv>
  <qandadiv id="authors" remap="FAQ-AUTHOR, Author">
    <title>Authors (including writers of HTML and Web page
      owners)</title>
    <qandaentry remap="FAQ-REPLACE, replace" id="replacehtml">
      <question>
	<formalpara>
	  <title>Does XML replace HTML?</title>
	  <para>No.</para>
	</formalpara>
      </question>
      <answer>
	<para>No. XML itself does not replace HTML. Instead, it
	  provides an alternative which allows you to define your own
	  set of markup elements. HTML is expected to remain in common
	  use for some time to come, and the current version of HTML
	  is in <link linkend="xhtml" xreflabel="simple">XML
	    syntax</link>. XML is designed to make the writing of DTDs
	  much simpler than with full SGML. (See <link linkend="dtds"
	    xreflabel="simple">the question on DTDs</link> for what
	  one is and why you might want one.)</para>
      </answer>
    </qandaentry>
    <qandaentry id="foreknowledge" remap="prelearn">
      <question>
	<formalpara>
	  <title>Do I have to know HTML or SGML before I learn
	    XML?</title>
	  <para>No, but it's useful.</para>
	</formalpara>
      </question>
      <answer>
	<para>No, although it's useful because a lot of XML
	  terminology and practice derives from two decades'
	  experience of SGML.</para>
	<para>Be aware that <quote>knowing HTML</quote> is not the
	  same as <quote>understanding SGML</quote>. Although HTML was
	  written as an SGML application, browsers ignore most of it
	  (which is why so many useful things don't work), so just
	  because something is done a certain way in HTML browsers
	  does not mean it's correct, least of all in XML.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-XMLDOC, xmldoc" id="internals" >
      <question>
	<formalpara>
	  <title>What does an XML document actually look like (inside)?</title>
	  <para>Pointy brackets like HTML</para>
	</formalpara>
      </question>
      <answer remap="top level internalsubset">
	<para>The basic structure of XML is similar to other
	  applications of SGML, including HTML. The basic components
	  can be seen in the following examples. An XML document
	  starts with a <firstterm>Prolog</firstterm>:</para>
	<orderedlist>
	  <listitem>
	    <para>The <firstterm>XML Declaration</firstterm></para>
	    <programlisting><![CDATA[
<?xml version="1.0" encoding="utf-8"?>
	 ]]></programlisting> 
	    <para>which specifies that this is an XML document;</para>
	  </listitem>
	  <listitem>
	    <para>Optionally a Document Type Declaration</para>
	    <programlisting><![CDATA[
<!DOCTYPE report SYSTEM "http://sales.acme.corp/dtds/salesrep.dtd">
	    ]]></programlisting>
	    <para>which identifies the type of document and says where
	    the Document Type Description (DTD) is stored;</para>
	  </listitem>
	</orderedlist>
	<para>The Prolog is followed by the document instance:</para>
	<orderedlist>
	  <listitem>
	    <para>A <firstterm>root element</firstterm>, which is the
	    outermost (top level) element (start-tag plus end-tag) which encloses
	    everything else: in the examples below the root elements
	      are <sgmltag>conversation</sgmltag> and
	    <sgmltag>titlepage</sgmltag>;</para>
	  </listitem>
	  <listitem>
	    <para>A structured mix of descriptive or prescriptive
	      <firstterm>elements</firstterm> enclosing the
	      <firstterm>character data content</firstterm> (text),
	      and optionally any <firstterm>attributes</firstterm>
	      (<quote>name=value</quote> pairs) inside some
	      start-tags.</para>
	  </listitem>
	</orderedlist>
	<para>XML documents can be very simple, with straightforward
	  nested markup of your own design:</para>
	<programlisting><![CDATA[
<?xml version="1.0" standalone="yes"?>
<conversation>
  <greeting>Hello, world!</greeting>
  <response>Stop the planet, I want to get 
   off!</response>
</conversation>
	  ]]></programlisting>
	<para>Or they can be more complicated,
	  with a <link linkend="schemas" xreflabel="simple">Schema</link> or <link
	    linkend="dtds">Document Type Description</link> (DTD) or
	  <firstterm>internal subset</firstterm> (local DTD changes in
	  [square brackets]), and an arbitrarily complex nested
	  structure:</para>
	<programlisting><![CDATA[
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE titlepage 
  SYSTEM "http://www.foo.bar/dtds/typo.dtd" 
[<!ENTITY % active.links "INCLUDE">]>
<titlepage id="BG12273624">
  <white-space type="vertical" amount="36"/>
  <title font="Baskerville" alignment="centered" 
   size="24/30">Hello, world!</title>
  <white-space type="vertical" amount="12"/>
	  <!-- In some copies the following 
           decoration is hand-colored, presumably 
           by the author -->
  <image location="http://www.foo.bar/fleuron.eps" 
   type="URI" alignment="centered"/>
  <white-space type="vertical" amount="24"/>
  <author font="Baskerville" size="18/22" 
   style="italic">Vitam capias</author>
  <white-space type="vertical" role="filler"/>
</titlepage>
	  ]]></programlisting>
	<para>Or they can be anywhere between: a lot will depend on
	  how you want to define your document type (or whose you use)
	  and what it will be used for. Database-generated or
	  program-generated XML documents used in e-commerce is usually
	  unformatted (not for human reading) and may use very long
	  names or values, with multiple redundancy and sometimes no
	  character data content at all, just values in
	  attributes:</para>
	<programlisting><![CDATA[
<?xml version="1.0"?>
<ORDER-UPDATE AUTHMD5="4baf7d7cff5faa3ce67acf66ccda8248"
 ORDER-UPDATE-ISSUE="193E22C2-EAF3-11D9-9736-CAFC705A30B3"
 ORDER-UPDATE-DATE="2005-07-01T15:34:22.46"
 ORDER-UPDATE-DESTINATION="6B197E02-EAF3-11D9-85D5-997710D9978F"
 ORDER-UPDATE-ORDERNO="8316ADEA-EAF3-11D9-9955-D289ECBC99F3">
  <ORDER-UPDATE-DELTA-MODIFICATION-DETAIL ORDER-UPDATE-ID="BAC352437484">
    <ORDER-UPDATE-DELTA-MODIFICATION-VALUE ORDER-UPDATE-ITEM="56"
     ORDER-UPDATE-QUANTITY="2000"/>
  </ORDER-UPDATE-DELTA-MODIFICATION-DETAIL>
</ORDER-UPDATE> 
	  ]]></programlisting>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-SPACE, space" id="whitespace">
      <question>
	<formalpara>
	  <title>How does XML handle white-space in my
	    documents?</title>
	  <para>Parsers keep it all. It's up to the application to
	    decide what to do with it.</para>
	</formalpara>
      </question>
      <answer remap="line break linefeed feed line-end line end
	    lineend eol white spaces blanks">
	<para>All white-space, including linebreaks (Mac CR, Win
	  CR/LF, Unix LF), TAB characters, and normal spaces,
	  <emphasis>even between <quote>structural</quote> elements
	    where no text can ever appear</emphasis>, is passed by the
	  parser <emphasis>unchanged</emphasis> to the application
	  (browser, formatter, viewer, converter, etc), identifying
	  the context in which the white-space was found (element
	  content, data content, or mixed content, if this information
	  is available to the parser, eg from a DTD or Schema). This
	  means <emphasis>it is the application's responsibility to
	    decide what to do with such space, not the
	    parser's</emphasis>:</para>
	<itemizedlist>
	  <listitem>
	    <para><emphasis>insignificant white-space</emphasis>
	      between structural elements (space which occurs where
	      only element content is allowed, ie between other
	      elements, where text data never occurs) will get passed
	      to the application (in SGML this white-space gets
	      suppressed, which is why you can put all that extra
	      space in HTML documents and not worry about it);</para>
	  </listitem>
	  <listitem>
	    <para><emphasis>significant white-space</emphasis> (space
	      which occurs within elements which
	      <emphasis>can</emphasis> contain text and markup mixed
	      together, usually mixed content or PCDATA) will still
	      get passed to the application exactly as under SGML. It
	      is the application's responsibility to handle it
	      correctly.</para>
	  </listitem>
	</itemizedlist>
	<para>The parser must inform the application that white-space
	  has occurred in element content, if it can detect it. (Users
	  of SGML will recognize that this information is not in the
	  <ulink
	    url="http://xml.coverpages.org/WG8-n931a.html">ESIS</ulink>, 
	  but it <emphasis>is</emphasis> in the <ulink
	    url="http://xml.coverpages.org/topics.html#groves">Grove</ulink>.)</para>
	<programlisting><![CDATA[ 
<chapter> 
  <title> 
   My title for
   Chapter 1. 
  </title> 
    <para> 
text 
    </para> 
</chapter>
	  ]]></programlisting>
	<para>In the example above, the application will receive all
	  the pretty-printing linebreaks, TABs, and spaces between the
	  elements <emphasis>as well as those</emphasis> embedded in
	  the chapter title. It is the function of the application,
	  not the parser, to decide which type of white-space to
	  discard and which to retain. Many XML applications have
	  configurable options to allow programmers or users to
	  control how such white-space is handled.</para>
	<note>
	  <title>Why?</title>
	  <para>In SGML, a DTD is compulsory always. A parser
	    therefore always knows in advance whether white-space has
	    occurred in element content (and can therefore be
	    discarded) or in mixed content or PCDATA (where it must be
	    preserved). XML allows processing without a DTD or Schema,
	    so it may be impossible to tell whether space should be
	    discarded or not, so the general rule was imposed that
	    <emphasis>all</emphasis> white-space must be reported to
	    the application.</para>
	</note>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-CASE, case" id="case">
      <question>
	<formalpara>
	  <title>Which parts of an XML document are
	    case-sensitive?</title>
	  <para>All of it, both markup and text.</para>
	</formalpara>
      </question>
      <answer remap="case sensitive sensative sensitivity">
	<para>All of it, both markup and text. This is significantly
	  different from HTML and most other SGML applications. It was
	  done to allow markup in non-Latin-alphabet languages, and to
	  obviate problems with case-folding in writing systems which are
	  caseless.</para>
	<itemizedlist>
	  <listitem>
	    <para>Element type names are case-sensitive: you must
	      follow whatever combination of upper- or lower-case you
	      use to define them (either by first usage or in a <link
		linkend="dtds" xreflabel="simple">DTD or
		Schema</link>). So you can't say <sgmltag
		class="starttag">BODY</sgmltag>&hellip;<sgmltag
		class="endtag">body</sgmltag>: upper- and lower-case
	      must match; thus <sgmltag
		class="emptytag">Img</sgmltag>, <sgmltag
		class="emptytag">IMG</sgmltag>, and <sgmltag
		class="emptytag">img</sgmltag> are three different
	      element types;</para>
	  </listitem>
	  <listitem>
	    <para>For well-formed XML documents with no DTD, the first
	      occurrence of an element type name defines the
	      casing;</para>
	  </listitem>
	  <listitem>
	    <para>Attribute names are also
	      case-sensitive, for example the two width attributes in
	    <programlisting><![CDATA[<PIC
		width="7in"/>]]></programlisting> and
	    <programlisting><![CDATA[<PIC
		WIDTH="6in"/>]]></programlisting> (if they occurred in
	      the same file) are separate attributes, because of
	      the different case of <sgmltag
		class="attribute">width</sgmltag> and <sgmltag
		class="attribute">WIDTH</sgmltag>;</para>
	  </listitem>
	  <listitem>
	    <para>Attribute values are also
	      case-sensitive. CDATA values (eg
	    <programlisting>Url="MyFile.SGML"</programlisting>) always
	      have been, but NAME types (ID and IDREF attributes, and
	      token list attributes) are now case-sensitive as
	      well;</para>
	  </listitem>
	  <listitem>
	    <para>All general and parameter entity names (eg <sgmltag
		class="genentity">Aacute</sgmltag>), and your data
	      content (text), are case-sensitive as always.</para>
	  </listitem>
	</itemizedlist>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-EXIST, exist" id="conversion">
      <question>
	<formalpara>
	  <title>How can I make my existing HTML files work in
	    XML?</title>
	  <para>Either make them XHTML or use a different document
	    type.</para>
	</formalpara>
      </question>
      <answer>
	<para>Either convert them to conform to some new document type
	  (with or without a DTD or Schema) and write a stylesheet to
	  go with them; or edit them to conform to <link xreflabel="simple"
	    linkend="xhtml">XHTML</link>.</para>
	<para>It is necessary to convert existing HTML files because
	  XML does not permit end-tag minimisation (missing <sgmltag
	    class="endtag">p</sgmltag>, etc), unquoted attribute
	  values, and a number of other SGML shortcuts which have been
	  normal in most HTML DTDs. However, many HTML authoring tools
	  already produce almost (but not quite) <link xreflabel="simple"
	    linkend="wf">well-formed XML</link>.</para>
	<para>You may be able to convert HTML
	  to XHTML using the
	  <personname>
	    <firstname>Dave</firstname>
	    <surname>Raggett</surname>
	  </personname>'s <ulink
	    url="http://tidy.sourceforge.net/">HTML Tidy</ulink>
	  program, which can clean up some of the horrible formatting
	  mess left behind by inadequate HTML editors, and even
	  separate out some of the formatting to a stylesheet, but
	  there is usually still some hand-editing to do.</para>
	<para>Most modern website design
	  programs, including DreamWeaver, still don't produce
	  anything like clean HTML, largely because they are for
	  making pages look pretty, rather than getting the
	  information right. If you get the information right in XML
	  first, and export it to a page design produced using a
	  website design program, it's probably less important that
	  the HTML is a mess. Using a website design program and its
	  HTML pages as the sole repository of your information can be
	  a dangerous mistake, though.</para>
      </answer>
      <answer remap="normalize normalization normalise normalisation
      escape characters sequences"> 
	<tip>
	  <title>Converting valid HTML to XHTML</title>
	  <para>If your HTML files are valid (full formal validation
	    with an SGML parser, not just a simple syntax check), then
	    try validating them as XHTML with an XML parser. If you
	    have been creating clean HTML without embedded formatting
	    then this process should throw up only mismatches in
	    upper/lowercase element and attribute names, and empty
	    elements (plus perhaps the odd non-standard element type
	    name if you use them). Simple hand-editing or a short
	    script should be enough to fix these changes.</para>
	  <para>If your HTML validly uses end-tag omission, this can
	    be fixed automatically by a normalization program like
	    <productname>sgmlnorm</productname> (from
	    <productname>OpenSP</productname>, which is part of <ulink
	      url="http://sourceforge.net/projects/openjade/"><productname>OpenJade</productname></ulink>) 
	    or by the <function>sgml-normalize</function> function in
	    an editor like
	    <productname>Emacs</productname>/<productname>psgml</productname> 
	    (don't be put off by the names, they both do XML).</para>
	  <para>If you have a lot of valid HTML files, you could write
	    a script to do this in a programming language which
	    understands SGML markup (such as <ulink
	      url="http://www.omnimark.com"><productname>Omnimark</productname></ulink>, 
	    <ulink
	      url="http://sgml.dircon.co.uk/"><productname>SGMLC</productname></ulink>, 
	    or one of the popular scripting languages (eg
	    <productname>Perl</productname>,
	    <productname>Python</productname>,
	    <productname>Tcl</productname>, etc), using their SGML/XML
	    libraries; or you could even use editor macros if you know
	    what you're doing.</para>
	</tip>
	<tip>
	  <title>Converting to a new document type</title>
	  <para>If you want to move your files out of HTML into some
	    other DTD entirely, there are many native XML application
	    DTDs, and standard XML versions of popular DTDs like
	    <productname>TEI</productname> and
	    <productname>DocBook</productname> or
	    <productname>DITA</productname> to choose from. There
	    is a pilot site run by CommerceNet (<ulink
	      url="http://www.xmlx.com/"></ulink>) for the exchange of
	    XML DTDs.</para>
	  <para>Alternatively you could just make up your own markup:
	    so long as it makes sense and you create a well-formed
	    file, you should be able to write a CSS or XSLT stylesheet
	    and have your document displayed in a browser.</para>
	</tip>
	<tip>
	  <title>Converting invalid HTML to well-formed XHTML</title>
	  <para>If your files are invalid HTML (95&pct; of the Web)
	    they can be converted to well-formed DTDless files as
	    follows:</para>
	  <orderedlist>
	    <listitem>
	      <para>replace the DOCTYPE Declaration with the XML
		Declaration <programlisting><![CDATA[<?xml
		version="1.0"
		encoding="iso-8859-1"?>]]></programlisting> (using the
		appropriate character encoding).</para> 
	    </listitem>
	    <listitem>
	      <para>If there was no DOCTYPE Declaration, just prepend
		the XML Declaration.</para>
	    </listitem>
	    <listitem>
	      <para>Change any EMPTY elements (eg every
		<sgmltag>BASE</sgmltag>, <sgmltag>ISINDEX</sgmltag>,
		<sgmltag>LINK</sgmltag>, <sgmltag>META</sgmltag>,
		<sgmltag>NEXTID</sgmltag> and <sgmltag>RANGE</sgmltag>
		in the header, and every <sgmltag>AREA</sgmltag>,
		<sgmltag>ATOPARA</sgmltag>,
		<sgmltag>AUDIOSCOPE</sgmltag>,
		<sgmltag>BASEFONT</sgmltag>, <sgmltag>BR</sgmltag>,
		<sgmltag>CHOOSE</sgmltag>, <sgmltag>COL</sgmltag>,
		<sgmltag>FRAME</sgmltag>, <sgmltag>HR</sgmltag>,
		<sgmltag>IMG</sgmltag>, <sgmltag>KEYGEN</sgmltag>,
		<sgmltag>LEFT</sgmltag>, <sgmltag>LIMITTEXT</sgmltag>,
		<sgmltag>OF</sgmltag>, <sgmltag>OVER</sgmltag>,
		<sgmltag>PARAM</sgmltag>, <sgmltag>RIGHT</sgmltag>,
		<sgmltag>SPACER</sgmltag>, <sgmltag>SPOT</sgmltag>,
		<sgmltag>TAB</sgmltag>, and <sgmltag>WBR</sgmltag> in
		the body of the document) so that they end with
	      <programlisting>/></programlisting> instead, for example
	      <programlisting><![CDATA[<img src="mypic.gif"
		  alt="Picture"/>]]></programlisting>;</para>
	    </listitem>
	    <listitem>
	      <para>Make all element names and attribute names
		lowercase;</para>
	    </listitem>
	    <listitem>
	      <para>Ensure there are correctly-matched explicit
		end-tags for all non-EMPTY elements; eg every <sgmltag
		  class="starttag">para</sgmltag> must have a <sgmltag
		  class="endtag">para</sgmltag>, etc;</para>
	    </listitem>
	    <listitem>
	      <para>Escape all <literal><![CDATA[<]]></literal> and
		<literal><![CDATA[&]]></literal> non-markup (ie
		literal text) characters as <sgmltag
		  class="genentity">lt</sgmltag> and <sgmltag
		  class="genentity">amp</sgmltag> respectively (there
		shouldn't be any isolated
		<literal><![CDATA[<]]></literal> characters to start
		with, anyway!);</para>
	    </listitem>
	    <listitem>
	      <para>Ensure all attribute values are in matched quotes
		(values with embedded single quotes must be in double
		quotes, and <foreignphrase>vice
		  versa</foreignphrase>&mdash;if you need both, use
		the <sgmltag class="genentity">quot</sgmltag>
		character entity reference);</para>
	    </listitem>
	    <listitem>
	      <para id="semicolon">Ensure all script URIs which have
		<literal><![CDATA[&]]></literal> as a field separator
		are changed to use <sgmltag
		  class="genentity">amp</sgmltag> or a semicolon
		instead.</para>
	    </listitem>
	  </orderedlist>
	</tip>
	  <para>Be aware that some obsolete HTML browsers may not
	  accept XML-style EMPTY elements with the trailing slash, so
	  the above changes may not be backwards-compatible. An
	  alternative is to add a dummy end-tag to all EMPTY elements,
	  so <programlisting><![CDATA[<img
	      src="foo.gif"/>]]></programlisting> becomes
	    <programlisting><![CDATA[<img
	      src="foo.gif"></img>]]></programlisting>. This is valid
	  XML but you must be able to guarantee no-one will ever put
	  any text content in such elements. Adding a space before the
	  closing slash in EMPTY elements (eg
	  <programlisting><![CDATA[<img
	      src="foo.gif" />]]></programlisting>) may also fool
	  older browsers into accepting XHTML as HTML.</para>
	<para>If you answer Yes to any of the questions in the <link
	    linkend="checklist"></link>, you can save
	    yourself a lot of grief by fixing those problems first
	    before doing anything else. You will likely then be
	    getting close to having well-formed files.</para> 
	  <para>Markup which is syntactically correct but semantically
	    meaningless or void should be edited out before
	    conversion. Examples are spacing devices such as repeated
	    empty paragraphs or linebreaks, empty tables, invisible
	    spacing GIFs etc. XML uses stylesheets, so you won't need
	    any of these.</para>
	  <para>Unfortunately there is rather a lot of work to do if
	    your files are invalid: this is why many Webmasters now
	    insist that only valid or well-formed files are used (and
	    why you should instruct your designers to do the same), in
	    order to avoid unnecessary manual maintenance and
	    conversion costs later.</para>
	<tip id="checklist">
	  <title>Checklist for invalid HTML</title>
	  <para>If your HTML files fall into this category (HTML
	    created by most WYSIWYG editors is usually invalid)
	    then they will almost certainly have to be converted
	    manually, although if the deformities are regular and
	    carefully constructed, the files may actually be almost
	    well-formed, and you could write a program or script to do
	    as described above. The oddities you may need to check for
	    include:</para>
	  <itemizedlist role="checklist">
	    <listitem>
	      <para>Do the files contain markup syntax errors? For
		example, are there any missing angle-brackets,
		backslashes instead of forward slashes on end-tags, or
		elements which nest incorrectly (eg
	      <programlisting><![CDATA[<B>those
		  starting <I>inside another element</B> but ending
		  outside</I> it]]></programlisting>)?</para>
	    </listitem>
	    <listitem>
	      <para>Are there any URIs (eg in <sgmltag
		  class="attribute">href</sgmltag>s or <sgmltag
		  class="attribute">src</sgmltag>s) which use
		Microsoft Windows-style backslashes instead of normal
		forward slashes?</para>
	    </listitem>
	    <listitem>
	      <para>Do the files contain markup which conflicts with
		HTML DTDs, such as headings or lists inside
		paragraphs, list items outside list environments,
		header elements like <sgmltag>base</sgmltag> preceding
		the first <sgmltag>html</sgmltag>, etc? (another
		sloppy editor trick)</para>
	    </listitem>
	    <listitem>
	      <para>Do the files use imaginary elements which are not
		in any known HTML DTD? (large amounts of these are
		used in proprietary markup systems masquerading as
		HTML). Although this is easy to transform to a DTDless
		well-formed file (because you don't have to define
		elements in advance) most proprietary or
		browser-specific extensions have never been formally
		defined, so it is often impossible to work out
		meaningfully where the element types can be
		used.</para>
	    </listitem>
	    <listitem>
	      <para>Are there any invalid (non-XML) characters in your
		files? Look especially for native Apple Mac Roman-8
		characters left by careless designers; any of the
		illegal characters (the 32 characters at decimal codes
		128&ndash;159 inclusive) inserted by MS-Windows
		editors; and any of the ASCII control characters
		0&ndash;31 (except those permitted like TAB, CR, and
		LF). These need to be converted to the correct
		characters in ISO 8859-1 (a common default in
		browsers), or the relevant plane of Unicode (in which
		case you will probably need to use UTF-8 as your
		document encoding).</para>
	    </listitem>
	    <listitem>
	      <para>Do your files contain invalid (old
		Mosaic/Netscape-style) comments? Comments must look
		<programlisting><![CDATA[<!-- like this
		-->]]></programlisting> with double-dashes each end
		and no double (especially not multiple) dashes in
		between.</para>
	    </listitem>
	  </itemizedlist>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="xhtml" remap="htmlxml">
      <question>
	<formalpara>
	  <title>Is there an XML version of HTML?</title>
	  <para>Yes, XHTML from W3C</para>
	</formalpara>
      </question>
      <answer>
	<para>Yes, the W3C recommends using <ulink
	    url="http://www.w3.org/TR/xhtml1/">XHTML</ulink> which is
	  <quote>a reformulation of HTML 4 in XML 1.0</quote>. This
	  specification defines HTML  as an XML application, and
	  provides three DTDs corresponding to the ones defined by
	  HTML 4.* (Strict, Transitional, and Frameset).</para>
	<para>The semantics of the elements and their attributes are
	  as defined in the W3C Recommendation for HTML 4. These
	  semantics provide the foundation for future extensibility of
	  XHTML. Compatibility with existing HTML browsers is
	  possible by following a small set of guidelines (see the W3C
	  site).</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-SUBSET, subset" id="tools">
      <question>
	<formalpara>
	  <title>If XML is just a subset of SGML, can I use XML files
	    directly with existing SGML tools?</title>
	  <para>Yes, if they are up to date</para>
	</formalpara>
      </question>
      <answer>
	<para>Yes, provided you use up-to-date SGML software which
	  knows about the <ulink
	    url="http://www.ornl.gov/sgml/sc34/document/0029.htm">WebSGML 
	    Adaptations TC to ISO 8879</ulink> (the features needed to
	  support XML, such as the variant form for EMPTY elements;
	  some aspects of the SGML Declaration such as NAMECASE
	  GENERAL NO; multiple attribute token list declarations,
	  etc).</para>
	<para>An alternative is to use an SGML DTD to let you create a
	  fully-normalised SGML file, but one which does not use empty
	  elements; and then remove the DocType Declaration so it
	  becomes a well-formed DTDless XML file. Most SGML tools now
	  handle XML files well, and provide an option switch between
	  the two standards. (see the pointers in <link
	    linkend="software"></link>).</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-LEARN, learn" id="learning">
      <question>
	<formalpara>
	  <title>I'm used to authoring and serving HTML. Can I learn
	    XML easily?</title>
	  <para>Yes</para>
	</formalpara>
      </question>
      <answer>
	<para>Yes, very easily, but at the moment there is still a
	  need for more tutorials, simpler tools, and more examples of
	  XML documents. <link linkend="wf"
	    xreflabel="simple"><quote>Well-formed</quote> XML
	    documents</link> may look similar to HTML except for some
	  small but very important points of syntax. </para>
	<para>The big practical difference is that XML has to stick to
	  the rules. HTML browsers let you serve them even fatally
	  broken or ridiculously corrupt HTML because they don't do a
	  formal parse but elide all the broken bits instead. With XML
	  your files have to be completely correct or they simply
	  won't work at all. One outstanding problem is that some
	  browsers claiming XML conformance are also broken, and some
	  browsers' support for CSS styling is dubious at the best.
	  Try yours on the test files at <ulink
	    url="http://xml.silmaril.ie/test.xml"></ulink> and <ulink
	    url="http://xml.silmaril.ie/hotels.xml"></ulink>.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-CHARENTS, charents" id="characters">
      <question>
	<formalpara>
	  <title>Can XML use non-Latin characters?</title>
	  <para>Yes, this is the default</para>
	</formalpara>
      </question>
      <answer remap="charset">
	<para id="faq:Unicode">Yes, the <link linkend="spec" xreflabel="simple">XML
	    Specification</link> explicitly says XML uses <ulink
	    url="http://www.iso.ch/">ISO 10646</ulink>, the
	  international standard character repertoire which covers
	  most known languages. <ulink
	    url="http://www.unicode.org/">Unicode</ulink> is an
	  identical repertoire, and the two standards track each
	  other. The spec says (2.2): <quote>All XML processors must
	    accept the UTF-8 and UTF-16 encodings of ISO
	    10646&hellip;</quote>. There is a Unicode FAQ at <ulink
	    id="FAQ:unicode"
	    url="http://www.unicode.org/faq/"></ulink>.</para>
	<para>UTF-8 is an encoding of Unicode into 8-bit characters:
	  the first 128 are the same as ASCII, and <ulink
	    url="http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt">higher-order 
	    characters are used to encode anything else from Unicode
	    into sequences of between 2 and 6 bytes</ulink>. UTF-8 in
	  its single-octet form is therefore the same as ISO 646 IRV
	  (ASCII), so you can continue to use ASCII for English or
	  other languages using the Latin alphabet without diacritics.
	  Note that UTF-8 is incompatible with ISO 8859-1 (ISO
	  Latin-1) after code point 127 decimal (the end of
	  ASCII).</para>
	<para>UTF-16 is an encoding of Unicode into 16-bit characters,
	  which lets it represent 16 planes. UTF-16 is incompatible
	  with ASCII because it uses two 8-bit bytes per character
	  (four bytes above U+FFFF).</para>
	<tip xreflabel="Peter Flynn" id="iso-8859-1">
	  <para>The encoding specification can refer to any character
	  set your software supports, but the XML Specification only
	  requires that applications support UTF-8 and UTF-16. Some of
	  the common encodings supported by software include:</para>
	  <variablelist>
	    <varlistentry>
	      <term>US-ASCII</term>
	      <listitem>
		<para>Characters codes TAB, LF, CR, space, and the
		  printable characters 33 to 126 (decimal) only (all other
		  control characters are forbidden by XML).</para>
	      </listitem>
	    </varlistentry>
	    <varlistentry>
	      <term>ISO-8859-1</term>
	      <listitem>
		<para>(Western European Latin-1) As ASCII plus codes
		128 to 255 (decimal). Covers most (but not all)
		western European accented letters.</para>
	      </listitem>
	    </varlistentry>
	    <varlistentry>
	      <term>ISO-8859-2 to 15</term>
	      <listitem>
		<para>The other planes of ISO-8859 (2 to 15) cover
		  different sets of Latin-based alphabetic and other
		  symbols.</para>
	      </listitem>
	    </varlistentry>
	    <varlistentry>
	      <term><quote>Codepages</quote> and other obsolescent sets</term>
	      <listitem>
		<para>Some software may also support various IBM and Microsoft
		  <quote>codepages</quote>, Apple Macintosh
		  <quote>Roman-8</quote>, DEC
		  <quote>Multinational</quote> and other non-standard
		  character encodings, but these are generally
		  non-portable and should be avoided where possible.</para>
	      </listitem>
	    </varlistentry>
	  </variablelist>
	  <para>One common practice in western Europe is to use
	    ISO-8859-1 so that the majority of common accented letters
	    can be used as single bytes, and to use character entity
	    references or numeric entities for all other characters.
	    This has the advantage that such files can be opened in
	    almost any single-byte editor. The drawback is that
	    numeric entities are not mnemonic, and character entities
	    have to be declared in DTD or internal subset, but if they
	    are rare, this may not be a serious problem.</para>
	</tip>
	<tip xreflabel="Bertilo Wennergren" id="utf-16">
	  <para>UTF-16 is an encoding that represents each Unicode
	    character of the first plane (the first 64K characters) of
	    Unicode with a 16-bit unit&mdash;in practice with two
	    bytes for each character. Thus it is backwards compatible
	    with neither ASCII nor Latin-1. UTF-16 can also access an
	    additional 1 million characters by a mechanism known as
	    surrogate pairs (two 16-bit units for each
	    character).</para>
	  <para><quote>&hellip;the mechanisms for signalling which of
	      the two are in use, and for bringing other encodings
	      into play, are [&hellip;] in the discussion of character
	      encodings.</quote> The <link linkend="spec" xreflabel="simple">XML
	      Specification</link> explains how to specify in your XML
	    file which coded character set you are using.</para>
	  <para><quote>Regardless of the specific encoding used, any
	      character in the ISO 10646 character set may be referred
	      to by the decimal or hexadecimal equivalent of its bit
	      string</quote>: so no matter which character set you
	    personally use, you can still refer to specific individual
	    characters from elsewhere in the encoded repertoire by
	    using <sgmltag class="numcharref">dddd</sgmltag> (decimal
	    character code) or <sgmltag
	      class="numcharref">xHHHH</sgmltag> (hexadecimal
	    character code, in uppercase). The terminology can get
	    confusing, as can the numbers: see the <ulink
	      url="http://cns-web.bu.edu/pub/djohnson/web_files/i18n/ISO-10646.html">ISO 
	      10646 Concept Dictionary</ulink>. <personname>
	      <firstname>Rick</firstname>
	      <surname>Jelliffe</surname>
	    </personname> has
	    <ulink
	      url="http://xml.coverpages.org/xml-ISOents.txt">XML-ized
	      the ISO character entity sets</ulink>. <personname>
	      <firstname>Mike</firstname>
	      <surname>Brown</surname>
	    </personname>'s
	    encoding information at <ulink
	      url="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial/</ulink> 
	    is a very useful explanation of the need for correct
	    encoding. There is an excellent online database of glyphs
	    and characters in many encodings from the Estonian
	    Language Institute server at <ulink
	      url="http://www.eki.ee/letter/">http://www.eki.ee/letter/</ulink>.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-DOCTYPE, doctype" id="dtds">
      <question>
	<formalpara>
	  <title>What's a Document Type Definition (DTD) and where do
	    I get one?</title>
	  <para>A specification of document structure. You can write
	    one or download them.</para>
	</formalpara>
      </question>
      <answer remap="compiled dtds xsds differences">
	<para>A DTD is a description in XML Declaration Syntax of a
	  particular type or class of document. It sets out what names
	  are to be used for the different types of element, where
	  they may occur, and how they all fit together. (A <link
	    linkend="schemas">Schema</link> does the same thing in XML
	  Document Syntax, and allows more extensive
	  data-checking.)</para>
	<para>For example, if you want a document type to be able to
	  describe Lists which contain Items, the relevant part of
	  your DTD might contain something like this:</para>
	<programlisting><![CDATA[ 
<!ELEMENT List (Item)+> 
<!ELEMENT Item (#PCDATA)> 
	  ]]></programlisting>
	<para>This defines a list as an element type containing one or
	  more items (that's the plus sign); and it defines items as
	  element types containing just plain text (Parsed Character
	  Data or PCDATA). Validators read the DTD before they
	  read your document so that they can identify where every
	  element type ought to come and how each relates to the
	  other, so that applications which need to know this in
	  advance (most editors, search engines, navigators, and
	  databases) can set themselves up correctly. The example
	  above lets you create lists like:</para>
	<programlisting><![CDATA[
<List>
  <Item>Chocolate</Item>
  <Item>Music</Item>
  <Item>Surfing</Item>
</List> 
	  ]]></programlisting>
	<para>(The
	  indentation in the example is just for legibility while
	  editing: it is not required by XML.)</para>
	<para>A DTD provides applications with advance notice of what
	  names and structures can be used in a particular document
	  type. Using a DTD and a validating editor means you can be
	  certain that all documents of that particular type
	  will be constructed and named in a consistent and conformant
	  manner.</para>
	<para>DTDs are not required for processing <link
	    linkend="wf">well-formed documents</link>, but they are
	  needed if you want to take advantage of XML's special
	  attribute types like the built-in ID/IDREF cross-reference
	  mechanism; or the use of default attribute values; or
	  references to external non-XML files
	  (<quote>Notations</quote>); or if you simply want a check on
	  document validity before processing.</para>
	<para>There are thousands of DTDs already in existence in all
	  kinds of areas (see the <ulink
	    url="http://xml.coverpages.org/">SGML/XML Web
	    pages</ulink> for pointers). Many of them can be
	  downloaded and used freely; or you can write your own (see
	  the question on <link linkend="owndtd"
	    xreflabel="simple">creating your own DTD</link>. Old SGML
	  DTDs need to be converted to XML for use with XML systems:
	  <link linkend="dtdconv" xreflabel="simple">read the question
	    on converting SGML DTDs to XML</link>, but most popular
	  SGML DTDs are already available in XML form.</para>
	<para>Some XML editors use a binary
	  compiled format of DTD produced by their own management
	  routines to allow a single person in an organisation to be
	  in charge of modifications, and to distribute only an
	  unmodifiable (binary compiled) version to users.</para>
	<para>The alternatives to a DTD are various forms of <link
	    linkend="schemas">Schema</link>. These provide more
	  extensive validation features than DTDs, including character
	  data content validation.</para>
      </answer>
    </qandaentry>
    <qandaentry id="makeup" remap="makeup">
      <question>
	<formalpara>
	  <title>Does XML let me make up my own tags?</title>
	  <para>Yes but they're not called tags. They're element
	    types.</para>
	</formalpara>
      </question>
      <answer>
	<para>No, it lets you make up names for your own element
	  types. If you think tags and elements are the same thing you
	  are already in considerable trouble: read the rest of this
	  question carefully.</para>
	<para>The same applies if you are thinking in terms of
	  <wordasword>fields</wordasword> (see <link
	    linkend="databases"></link>). Wrong paradigm, wrong
	  language.</para>
	<tip xreflabel="Bob DuCharme">
	  <para>Don't confuse the term <quote>tag</quote> with the
	    term <quote>element</quote>. They are not interchangeable.
	    An element usually contains two different kinds of tag: a
	    start-tag and an end-tag, with text or more markup between
	    them.</para>
	  <para>XML lets you decide which elements you want in your
	    document and then indicate your element boundaries using
	    the appropriate start- and end-tags for those elements.
	    Each
	    <programlisting><![CDATA[<!ELEMENT...]]></programlisting>
	    declaration defines a type of element that may be used in
	    a document conforming to that DTD. We call this type of
	    element an <quote>element type</quote>. Just as the HTML
	    DTD includes the <sgmltag>H1</sgmltag> and
	    <sgmltag>P</sgmltag> element types, your document can have
	    <sgmltag>color</sgmltag> and <sgmltag>price</sgmltag>
	    element types, or anything else you want.</para>
	  <para>Normal non-empty elements are made up of a start-tag,
	    the element's content, and an end-tag.
	  <programlisting><![CDATA[<color>red</color>]]></programlisting> 
	    is a complete instance of the <sgmltag>color</sgmltag>
	    element. <sgmltag class="starttag">color</sgmltag> is only
	    the start-tag of the element, showing where it begins; it
	    is not the element itself.</para>
	  <para>Empty elements are a special case that may be
	    represented either as a pair of start- and end-tags with
	    nothing between them (eg <programlisting><![CDATA[<price
	    retail="123"></price>]]></programlisting>) or as a single
	    empty element start-tag that has a closing slash to tell
	    the parser <quote>don't go looking for an end-tag to match
	      this</quote> (eg <programlisting><![CDATA[<price
	      retail="123"/>]]></programlisting>).</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="owndoctype">
      <question>
	<formalpara>
	  <title>How do I create my own document type?</title>
	  <para>Analyse the class of documents, and write a DTD or
	    Schema</para>
	</formalpara>
      </question>
      <answer>
	<para>Document types usually need a formal description, either
	  a DTD or a Schema. Whilst it is possible to process
	  well-formed XML documents without any such description,
	  trying to create them without one is asking for trouble. A
	  DTD or Schema is used with an XML editor or API interface to
	  guide and control the construction of the document, making
	  sure the right elements go in the right places.</para>
	<para>Creating your own document type therefore begins with an
	  analysis of the class of documents you want to describe:
	  reports, invoices, letters, configuration files, credit-card
	  verification requests, or whatever. Once you have the
	  structure correct, you write code to express this formally,
	  using <link linkend="owndtd" xreflabel="simple">DTD</link>
	  or Schema syntax.</para>
      </answer>
    </qandaentry>
    <qandaentry id="owndtd" remap="owndtd">
      <question>
	<formalpara>
	  <title>How do I write my own DTD?</title>
	  <para>Learn XML Declaration Syntax</para>
	</formalpara>
      </question>
      <answer>
	<para>You need to use the XML Declaration Syntax (very simple:
	  declaration keywords begin with
	  <programlisting><![CDATA[<!]]></programlisting> rather than
	  just the open angle bracket, and the way the declarations
	  are formed also differs slightly). Here's an example of a
	  DTD for a shopping list, based on the fragment used <link
	  linkend="dtds" xreflabel="simple">earlier</link>:</para>
	<programlisting><![CDATA[
<!ELEMENT Shopping-List (Item)+>
<!ELEMENT Item (#PCDATA)>
	  ]]></programlisting>
	<para>It says that there shall be an element called
	  <sgmltag>Shopping-List</sgmltag> and that it shall contain
	  elements called <sgmltag>Item</sgmltag>: there must be at
	  least one Item (that's the plus sign) but there may be more
	  than one. It also says that the <sgmltag>Item</sgmltag>
	  element may contain only parsed character data (PCDATA, ie
	  text: no further markup).</para>
	<para>Because there is no other element which contains
	  <sgmltag>Shopping-List</sgmltag>, that element is assumed to
	  be the <quote>root</quote> element, which encloses
	  everything else in the document. You can now use it to
	  create an XML file: give your editor the
	  declarations:</para>
	<programlisting><![CDATA[ 
<?xml version="1.0"?> 
<!DOCTYPE Shopping-List SYSTEM "shoplist.dtd"> 
	  ]]></programlisting>
	<para>(assuming you put the DTD in that file). Now your editor
	  will let you create files according to the pattern:</para>
	<programlisting><![CDATA[
<Shopping-List>
  <Item>Chocolate</Item>
  <Item>Sugar</Item>
  <Item>Butter</Item>
</Shopping-List>
	  ]]></programlisting>
	<para>It is possible to develop complex and powerful DTDs of
	  great subtlety, but for any significant use you should learn
	  more about document systems analysis and document type
	  design. See for example <xref 
	    linkend="devdtd"/>: this was written for SGML but perhaps
	  95&pct; of it applies to XML as well, as XML is much simpler
	  than full SGML&mdash;see the <link linkend="restrict"
	    xreflabel="simple">list of restrictions</link> which shows
	  what has been cut out.</para>
	<warning>
	  <para>Incidentally, a DTD file <emphasis>never</emphasis>
	    has a DOCTYPE Declaration in it: that only occurs in an
	    XML document instance (it's what references the DTD). And
	    a DTD file also never has an XML Declaration at the top
	    either. Unfortunately there is still software around which
	    inserts one or both of these.</para>
	</warning>
      </answer>
    </qandaentry>
    <qandaentry id="rootelement" remap="documentElement .documentElement">
      <question>
	<formalpara>
	  <title>Can a root element type be explicitly declared in the
	    DTD?</title>
	  <para>No, use the Document Type Declaration.</para>
	</formalpara>
      </question>
      <answer>
	<para>No. This is done in the document's Document Type
	  Declaration, not in the DTD.</para>
	<tip xreflabel="Bob DuCharme">
	  <para>In a Document Type Declaration like this:
	  <programlisting><![CDATA[ 
<!DOCTYPE chapter SYSTEM "docbookx.dtd"> 
	    ]]></programlisting> the whole point of the
	    <sgmltag>chapter</sgmltag> part is to identify which of
	    the element types declared in the specified DTD should be
	    used as the root element. I believe the highest level
	    element in DocBook is <sgmltag>set</sgmltag>, but I find
	    it hard to imagine someone creating a document to
	    represent a set of books. We are free to use
	    <sgmltag>set</sgmltag>, <sgmltag>book</sgmltag>,
	    <sgmltag>chapter</sgmltag>, <sgmltag>article</sgmltag>, or
	    even <sgmltag>para</sgmltag> as the document element for a
	    valid DocBook document.</para>
	  <para>[One job some parsers do is determine which element
	    type[s] in a DTD are not contained in the content model of
	    any other element type: these are by deduction the prime
	    candidates for being default root elements. (PF)]</para>
	  <para>This is A Good Thing, because it adds flexibility to
	    how the DTD is used. It's the reason that XML (and SGML)
	    have lent themselves so well to electronic publishing
	    systems in which different elements were mixed and matched
	    to create different documents all conforming to the same
	    DTD.</para>
	  <para>I've seen schema proposals that let you specify which
	    of a schema's element types could be a document's root
	    element, but after a quick look at <ulink
	      url="http://www.w3.org/TR/xmlschema-1/#cElement_Declarations">section 
	      3.3 of Part 1 of the W3C Schema Recommendation</ulink>
	    and the RELAX NG schema for RELAX, I don't believe that
	    either of these let you do this. I could be wrong.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="schemas" remap="schemata dtds xsds differences">
      <question>
	<formalpara>
	  <title>I keep hearing about alternatives to DTDs. What's a
	    Schema?</title>
	  <para>Like a DTD for validating content as well as
	    structure.</para>
	</formalpara>
      </question>
      <answer remap="modelling modeling">
	<para><ulink url="http://www.w3.org/TR/xmlschema-0/">The W3C
	    XML Schema recommendation</ulink> provides a means of
	  specifying formal data typing and validation of element
	  content in terms of data types, so that document type
	  designers can provide criteria for checking the data content
	  of elements as well as the markup itself. Schemas are
	  written in XML Document Syntax, like XML documents are,
	  avoiding the need for processing software to be able to read
	  XML Declaration Syntax (used for DTDs).</para>
	<para id="faq:Schema">There is a separate Schema FAQ at <ulink
	    id="FAQ:schema" url="http://www.schemavalid.com"></ulink>.
	  The term <quote>vocabulary</quote> is sometimes used to
	  refer to DTDs and Schemas together. Schemas are aimed at
	  e-commerce, data control, and database-style applications
	  where character data content requires validation and where
	  stricter data control is needed than is possible with DTDs;
	  or where strong data typing is required. They are usually
	  unnecessary for traditional text document publishing
	  applications.</para>
	<para>Unlike DTDs, Schemas cannot be specified in an XML
	  Document Type Declaration. They can be specified in a <link
	    xreflabel="simple" linkend="namespaces">Namespace</link>,
	  where Schema-aware software should pick it up, but this is
	  optional:</para>
	<programlisting><![CDATA[
<invoice id="abc123"
         xmlns="http://example.org/ns/books/"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://acme.wilycoyote.org/xsd/invoice.xsd">
...
</invoice>         
	  ]]></programlisting>
	<para>More commonly, you specify the Schema in your processing
         software, which should record separately which Schema is used
         by which XML document instance.</para>
	<para>In contrast to the complexity of the W3C Schema model,
	  Relax NG is a lightweight, easy-to-use XML schema language
	  devised by <personname>
	    <firstname>James</firstname>
	    <surname>Clark</surname>
	  </personname> (see <ulink
	    url="http://relaxng.org/"></ulink>) with development
	  hosted by <ulink
	    url="http://www.oasis-open.org/committees/relax-ng/">OASIS</ulink>. 
	  It allows similar richness of expression and the use of XML
	  as its syntax, but it provides an additional, simplified,
	  syntax which is easier to use for those accustomed to
	  DTDs.</para>
	<warning>
	  <para>Authors and publishers should note that the English
	    plural of Schema is Schemas: the use of the singular to do
	    duty for the plural is a foible dear to the semi-literate;
	    the use of the old (Greek) plural schemata is unnecessary
	    didacticism.</para>
	  <para>Writers should also note that the plural of DTD is
	    <ulink
	      url="http://xml.coverpages.org/properSpellingForPluralOfDTD.html">DTDs</ulink>: 
	    there is no apostrophe&mdash;see <xref
	      linkend="esl"/>.</para>
	</warning>
	<tip xreflabel="Bob DuCharme">
	  <para>Many XML developers were dissatisfied with the syntax
	    of the markup declarations described in the XML spec for
	    two reasons. First, they felt that if XML documents were
	    so good at describing structured information, then the
	    description of a document type's own structure (its
	    schema) should be in an XML document instead of written
	    with its own special syntax. In addition to being more
	    consistent, this would make it easier to edit and
	    manipulate the schema with regular document manipulation
	    tools. Secondly, they felt that traditional DTD notation
	    didn't allow document type designers the power to impose
	    enough constraints on the data&mdash;for example, the
	    ability to say that a certain element type must always
	    have a positive integer value, that it may not be empty,
	    or that it must be one of a list of possible choices. This
	    eases the development of software using that data because
	    the developer has less error-checking code to
	    write.</para>
	</tip>
	<tip xreflabel="Peter Flynn">
	<para>A <link linkend="dtds" xreflabel="simple">DTD</link> is
	    only for specifying the element structure of an XML file,
	    with a very limited amount of control over attribute
	    values. It gives the names of the elements, attributes,
	    and entities that can be used, and how they fit together.
	    DTDs are designed for use with traditional text documents,
	    not rectangular or tabular data, so the concept of data
	    types is not relevant: text is just text. If you need to
	    specify numeric ranges or to define limitations or checks
	    on the character data (text) content, a DTD is the wrong
	    tool.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="databases" remap="db">
      <question>
	<formalpara>
	  <title>How do I get XML into or out of my database?</title>
	  <para>Ask your database manufacturer</para>
	</formalpara>
      </question>
      <answer remap="mysql msql sql oracle server">
	<para>Ask your database manufacturer: they all provide XML
	  import and export modules to connect XML applications with
	  databases.</para>
	<para>In some trivial cases there will be a 1:1 match
	  between field names in the database table and element type
	  names in the XML Schema or DTD, but in most cases some
	  programming will be required to establish the desired match.
	  This can usually be stored as a procedure so that subsequent
	  uses are simply commands or calls with the relevant
	  parameters.</para>
	<para>Alternatively, most database systems now provide an XML
	  dump format that lets you export a table as-is, for example
	  by surrounding the field values with tags called after the
	  fieldnames.</para>
	<para>In less trivial, but still simple, cases, you could
	  export by writing a report routine that formats the output
	  as an XML document by adding the relevant tags as literals
	  before and after each data value; and you could import by
	  writing an XSLT transformation that formatted the XML data
	  as a load file in your database's preferred format.</para>
	<warning>
	  <para>Users from a database or computer science background
	    should be aware that XML is not a database management
	    system: it is a text markup system. While there are many
	    similarities, some of the concepts of one are simply
	    non-existent in the other: XML does not possess some
	    database-like features in the same way that databases do
	    not possess markup-like ones. It is a common error to
	    believe that XML is a DBMS like Oracle or Access and
	    therefore possesses the same facilities. It
	    doesn't.</para>
	</warning>
	<para id="dbarts">Database users should read the article
	  <xref linkend="docdb"/> [thanks to <personname>
	    <firstname>Bart</firstname>
	    <surname>Lateur</surname>
	  </personname> for identifying this.] <personname>
	    <firstname>Ronald</firstname>
	    <surname>Bourret</surname>
	  </personname> also maintains a good resource on XML and
	  Databases discussing native XML databases at <ulink
	    url="http://www.rpbourret.com/xml/XMLAndDatabases.htm"></ulink>.</para>
	<para id="faq:XQL">There is some information about the <ulink
	    url="http://www.w3.org/XML/Query">XQuery</ulink> (XQL)
	  Language in the <link linkend="searching"
	    xreflabel="simple">note on Searching</link>.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-HYPERTEXT, hypertext" id="links">
      <question>
	<formalpara>
	  <title>How will XML affect my document links?</title>
	  <para>XML Links are much more powerful, but not yet
	    implemented in browsers</para>
	</formalpara>
      </question>
      <answer remap="extending linking anchors hrefs">
	<para>The linking abilities of XML systems are potentially
	  much more powerful than those of HTML, so you'll be able to
	  do much more with them. Existing <sgmltag
	    class="attribute">href</sgmltag>-style links will remain
	  usable, but the new linking technology is based on the
	  lessons learned in the development of other standards
	  involving hypertext, such as <ulink
	    url="http://www.tei-c.org/">TEI</ulink> and <ulink
	    url="http://xml.coverpages.org/hytime.html">HyTime</ulink>, 
	  which let you manage bidirectional and multi-way links, as
	  well as links to a whole element or span of text (within
	  your own or other documents) rather than to a single point.
	  These features have been available to SGML users for many
	  years, so there is considerable experience and expertise
	  available in using them. Currently only Mozilla Firefox
	  implements XLink.</para>
	<para id="linkspecs">The <ulink
	    url="http://www.w3.org/TR/xlink/">XML Linking
	    Specification (XLink)</ulink> and the <ulink
	    url="http://www.w3.org/TR/WD-xptr">XML Extended Pointer
	    Specification (XPointer)</ulink> documents contain the
	  details. An XLink can be either a URI or a TEI-style
	  Extended Pointer (<link linkend="loc2" id="loc1"
	    xreflabel="simple">XPointer</link>), or both. A URI on its
	  own is assumed to be a resource; if an XPointer follows it,
	  it is assumed to be a sub-resource of that URI; an XPointer
	  on its own is assumed to apply to the current document (all
	  exactly as with HTML).</para>
	<para>An XLink may use one of <literal>#</literal>,
	  <literal>?</literal>, or <literal>|</literal>. The
	  <literal>#</literal> and <literal>?</literal> mean the same
	  as in HTML applications; the <literal>|</literal> means the
	  sub-resource can be found by applying the link to the
	  resource, but the method of doing this is left to the
	  application. An XPointer can only follow a
	  <literal>#</literal>.</para>
	<para>The <ulink
	    url="http://etext.virginia.edu/bin/tei-tocs?div=DIV2;id=SAXR">TEI 
	    Extended Pointer Notation</ulink> (EPN) is much more
	  powerful than the fragment address on the end of some URIs,
	  as it allows you to specify the location of a link end using
	  the structure of the document as well as (or in addition to)
	  known, fixed points like IDs. For example, <link
	    xreflabel="simple" linkend="loc1" id="loc2">the linked
	    second occurrence</link> of the word
	  <quote>XPointer</quote> two paragraphs back could be
	  referred to with the URI (shown here with linebreaks and
	  spaces for clarity: in practice it would of course be all
	  one long string):</para>
	<programlisting><![CDATA[
http://xml.silmaril.ie/faq.xml#ID(hypertext)
    .child(1,#element,'answer')
    .child(2,#element,'para')
    .child(1,#element,'link')
	  ]]></programlisting> 
	<para>This means the first <sgmltag>link</sgmltag> element
	  within the second paragraph within the
	  <sgmltag>answer</sgmltag> in the element whose ID is
	  <sgmltag class="attvalue">hypertext</sgmltag> (this
	  question). Count the objects from the start of this question
	  (which has the ID <sgmltag
	    class="attvalue">hypertext</sgmltag>) in the <ulink
	    url="http://xml.silmaril.ie/faq.sgml">XML
	    source</ulink>:</para>
	<orderedlist>
	  <listitem>
	    <para>the first child object is the element containing the
	      question (<sgmltag>quandaentry</sgmltag>);</para>
	  </listitem>
	  <listitem>
	    <para>the second child object is the answer (the
	      <sgmltag>answer</sgmltag> element);</para>
	  </listitem>
	  <listitem>
	    <para>within this element go to the second
	      paragraph;</para>
	  </listitem>
	  <listitem>
	    <para>find the first <sgmltag>link</sgmltag>
	      element.</para>
	  </listitem>
	</orderedlist>
	<para>Eve Maler explained the relationship of XLink and
	  XPointer as follows:</para>
	<blockquote>
	  <para>XLink governs how you insert links
	    <emphasis>into</emphasis> your XML document, where the
	    link might point to anything (eg a GIF file); XPointer
	    governs the fragment identifier that can go on a URL when
	    you're linking <emphasis>to</emphasis> an XML document,
	    <emphasis>from</emphasis> anywhere (eg from an HTML
	    file).</para>
	  <para>[Or indeed from an XML file, a URI in a mail message,
	    etc&hellip;Ed.]</para>
	</blockquote>
	<para><personname>
	    <firstname>David</firstname>
	    <surname>Megginson</surname>
	  </personname> has produced an <ulink
	    url="http://www.megginson.com/Software/psgml-xpointer.el">xpointer</ulink> 
	  function for Emacs/psgml which will deduce an XPointer for
	  any location in an XML document. XML Spy has a similar
	  function.</para>
      </answer>
    </qandaentry>
    <qandaentry id="mathematics" remap="FAQ-MATH, math"
    >
      <question>
	<formalpara>
	  <title>Can I encode mathematics using XML?</title>
	  <para>Yes, using MathML.</para>
	</formalpara>
      </question>
      <answer remap="add subtract multiply divide addition subtraction
	multiplication division">
	<para>Yes, if the <link linkend="dtds"
	    xreflabel="simple">document type</link> you use provides
	  for math, and your users' browsers are capable of rendering
	  it. The mathematics-using community has developed the <ulink
	    url="http://www.w3.org/Math/">MathML
	    Recommendation</ulink> at the W3C, which is a native XML
	  application suitable for embedding in other DTDs and
	  Schemas.</para>
	<para>It is also possible to make XML fragments from other
	  DTDs, such as <ulink
	    url="http://xml.coverpages.org/gen-apps.html#iso12083DTDs">ISO 
	    12083 Math</ulink>, or <ulink
	    url="http://www.openmath.org/">OpenMath</ulink>, or one of
	  your own making. Browsers which display math embedded in
	  SGML existed for many years (eg DynaText, Panorama, Multidoc
	  Pro), and mainstream browsers are now rendering MathML.
	  <personname>
	    <firstname>David</firstname>
	    <surname>Carlisle</surname>
	  </personname> has produced a <ulink
	    url="http://www.mathmlconference.org/2002/presentations/carlisle/">set 
	    of stylesheets</ulink> for rendering MathML in browsers.
	  It is also possible to use XSLT to convert XML math markup
	  to <LaTeX/> for print (PDF) rendering, or to use
	  XSL:FO.</para>
	<para>Please note that XML is not itself a programming
	  language, so concepts such as arithmetic and
	    <wordasword>if</wordasword>-statements (if-then-else
	    logic) are not meaningful in normal XML documents.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-META, metadata" id="metadata">
      <question>
	<formalpara>
	  <title>How does XML handle metadata?</title>
	  <para>Any way you want.</para>
	</formalpara>
      </question>
      <answer>
	<para>Because XML lets you define your own markup languages,
	  you can make full use of the extended hypertext features of
	  XML (see the question on <link xreflabel="simple"
	    linkend="links">Links</link>) to store or link to metadata
	  in any format (eg using <ulink
	    url="http://www.sdct.itl.nist.gov/~ftp/x3l8/other/Standards/iso11179/">ISO&nbsp;11179</ulink>, 
	  as a <ulink
	    url="http://www.oasis-open.org/committees/tm-pubsubj/">Topic 
	    Maps Published Subject</ulink>, with <ulink
	    url="http://purl.oclc.org/metadata/dublin_core/">Dublin
	    Core, Warwick Framework</ulink>, or with <ulink
	    url="http://www.dstc.edu.au/RDU/RDF/">Resource Description
	    Framework (RDF)</ulink>, or even <ulink
	    url="http://www.w3.org/PICS/">Platform for Internet
	    Content Selection (PICS)</ulink>).</para>
	<para>There are no predefined elements in XML, because it is
	  an architecture, not an application, so it is not part of
	  XML's job to specify how or if authors should or should not
	  implement metadata. You are therefore free to use any
	  suitable method. Browser makers may also have their own
	  architectural recommendations or methods to propose.</para>
      </answer>
    </qandaentry>
    <qandaentry id="scripts" remap="FAQ-JAVA, java"
    >
      <question>
	<formalpara>
	  <title>Can I use JavaScript, ActiveX, etc in XML
	    files?</title>
	  <para>Not in the XML file itself, but via a
	    stylesheet.</para>
	</formalpara>
      </question>
      <answer>
	<para>This will depend on what facilities your users' browsers
	  implement. XML is about describing information; scripting
	  languages and languages for embedded functionality are
	  software which enables the information to be manipulated at
	  the user's end, so these languages do not normally have any
	  place in an XML file itself, but in stylesheets like XSL and
	  CSS where they can be added to generated HTML.</para>
	<para>XML itself provides a way to define the markup needed to
	  implement scripting languages: as a neutral standard it
	  neither encourages not discourages their use, and does not
	  favour one language over another, so it is possible to use
	  XML markup to store the program code, from where it can be
	  retrieved by (for example) XSLT and re-expressed in a HTML
	  <sgmltag>script</sgmltag> element.</para>
	<para>Server-side script embedding, like PHP or ASP, can be
	  used with the relevant server to modify the XML code on the
	  fly, as the document is served, just as they can with HTML.
	  Authors should be aware, however, that embedding server-side
	  scripting may mean the file as stored is not valid XML: it
	  only becomes valid when processed and served, so care must
	  be taken when using validating editors or other software to
	  handle or manage such files. A better solution may be to use
	  an XML serving solution like <ulink
	    url="http://cocoon.apache.org/">Cocoon</ulink>, <ulink
	    url="http://axkit.org/">AxKit</ulink>, or <ulink
	    url="http://www.propylon.com/products/propelx/">PropelX</ulink>.</para>
      </answer>
    </qandaentry>
    <qandaentry id="java" remap="java-gen">
      <question>
	<formalpara>
	  <title>Can I use Java to create or manage XML files?</title>
	  <para>Sure.</para>
	</formalpara>
      </question>
      <answer>
	<para>Yes, any programming language can be used to output data
	  from any source in XML format. There is a growing number of
	  front-ends and back-ends for programming environments and
	  data management environments to automate this. Java is just
	  the most popular one at the moment.</para>
	<para>There is a large body of middleware (APIs) written in
	  Java and other languages for managing data either in XML or
	  with XML input or output. There is a suite of Java tutorials
	  (with source code and explanation) available at <ulink
	    url="http://developerlife.com">http://developerlife.com</ulink>.</para>
	<note>
	  <para>Please do not mail the FAQ editor with questions about
	    your Java programming bugs. Ask one of the Java newsgroups
	    instead, or sign up for the <link linkend="summer"
	      xreflabel="simple">XML SummerSchool</link>, where there
	    is usually a session on using Java and XML.</para>
	</note>
      </answer>
    </qandaentry>
    <qandaentry id="execute" remap="exec">
      <question>
	<formalpara>
	  <title>How do I execute or run an XML file?</title>
	  <para>Not a meaningful question. XML is a data
	    format.</para>
	</formalpara>
      </question>
      <answer remap="executable execution running">
	<para>You can't and you don't. XML itself is not a programming
	  language, so XML files don't <quote>run</quote> or
	  <quote>execute</quote>. XML is a markup specification
	  language and XML files are just data: they sit there until
	  you run a program which displays them (like a browser) or
	  does some work with them (like a converter which writes the
	  data in another format, or a database which reads the data),
	  or modifies them (like an editor).</para>
	<para>If you want to view or display an XML file, open it with
	  an <link linkend="editors" xreflabel="simple">XML
	    editor</link> or an <link xreflabel="simple" linkend="browsers">XML
	    browser</link>.</para>
	<para>The water is muddied by XSL (both XSLT and XSL:FO) which
	  use XML syntax to implement a declarative programming
	  language. In these cases it is arguable that you can
	  <quote>execute</quote> XML code, by running a processing
	  application like Saxon, which compiles the directives
	  specified in XSLT files into Java bytecode to process
	  XML.</para>
      </answer>
    </qandaentry>
    <qandaentry id="style" remap="FAQ-STYLE, style">
      <question>
	<formalpara>
	  <title>How do I control formatting and appearance?</title>
	  <para>Use a CSS or XSLT stylesheet.</para>
	</formalpara>
      </question>
      <answer remap="calling assigning stylesheets document format
	  language styling converting putting transform cascading style sheets
	  layout ie internet explorer ie6">
	<para>In HTML, default styling was built into the browsers
	  because the tagset of HTML was predefined and hardwired into
	  browsers. In XML, where you can define your own tagset,
	  browsers cannot possibly be expected to guess or know in
	  advance what names you are going to use and what they will
	  mean, so you need a stylesheet if you want to display
	  formatted text.</para>
	<para><link xreflabel="simple" linkend="browsers">Browsers
	    which read XML</link> will accept and use a CSS stylesheet
	  at a minimum, but you can also use the more powerful XSLT
	  stylesheet language to transform your XML into
	  HTML&mdash;which browsers, of course, already know how to
	  display (and that HTML can still use a CSS stylesheet). This
	  way you get all the document management benefits of using
	  XML, but you don't have to worry about your readers needing
	  XML smarts in their browsers.</para>
	<tip xreflabel="Mike Brown">
	  <para>XSLT is an XML document processing language that uses
	    source code that happens to be written in XML. An XSLT
	    document declares a set of rules for an XSLT processor to
	    use when interpreting the contents of an XML document.
	    These rules tell the XSLT processor how to generate a new
	    XML-like data structure and how that data should be
	    emitted&mdash;as an XML document, as an HTML document, as
	    plain text, or perhaps in some other format.</para>
	  <para>This transformation can be done either inside the
	    browser, or by the server before the file is sent.
	    Transformation in the browser offloads the processing from
	    the server, but may introduce browser dependencies,
	    leading to some of your readers being excluded.
	    Transformation in the server makes the process
	    browser-independent, but places a heavier processing load
	    on the server.</para>
	</tip>
	<para>As with any system where files can be viewed at random
	  by arbitrary users, the author cannot know what resources
	  (such as fonts) are on the user's system, so the same care
	  is needed as with HTML using fonts. To invoke a stylesheet
	  from an XML file for standalone processing in the browser,
	  include one of the stylesheet declarations:</para>
	<programlisting><![CDATA[ 
<?xml-stylesheet href="foo.xsl" type="text/xsl"?> 
<?xml-stylesheet href="foo.css" type="text/css"?> 
	  ]]></programlisting>
	<para>(substituting the URI of your stylesheet, of
	  course). See <ulink
	    url="http://www.w3.org/TR/xml-stylesheet/"></ulink> for
	  the full details.
	  The <ulink url="http://www.w3.org/Style/css">Cascading
	    Stylesheet Specification (CSS)</ulink> provides a simple
	  syntax for assigning styles to elements, and has been
	  implemented in most browsers.</para>
	<para id="faq:XSL"><personname>
	    <firstname>Dave</firstname>
	    <surname>Pawson</surname>
	  </personname> maintains a comprehensive XSL FAQ at <ulink
	    id="FAQ:xsl"
	    url="http://www.dpawson.co.uk/xsl/"></ulink>,
	  and his book <xref linkend="fox"/> [the Fox book] is
	  available from O'Reilly. XSL uses XML syntax (an XSL
	  stylesheet is just an XML file) and has widespread support
	  from several major  browser vendors (see the questions on
	  <link linkend="browsers" xreflabel="simple">browsers</link>
	  and <link linkend="software" xreflabel="simple">other
	    software</link>). XSL comes in two flavours:</para>
	<itemizedlist>
	  <listitem>
	    <para>XSL itself, which is a pure formatting language,
	      outputting a Formatted Objects (FO) file, which needs a
	      text formatter like <ulink
		url="http://xml.apache.org/">FOP</ulink>, <ulink
		url="http://www.renderx.com/">XEP</ulink>, or others
	      to create printable (PDF) output (but see <link
		linkend="TeX" xreflabel="directional"></link>).
	      Currently I am not aware of any Web browsers which
	      support direct XSL rendering to PDF;</para>
	  </listitem>
	  <listitem>
	    <para>XSLT (T for Transformation), which is a language to
	      specify transformations of XML into HTML either inside
	      the browser or at the server before transmission. It can
	      also specify transformations from one vocabulary of XML
	      to another, and from XML to plaintext (which can be any
	      format, including RTF and <LaTeX/>).</para>
	  </listitem>
	</itemizedlist>
	<para>Currently only Microsoft Internet Explorer 5.5 and
	  above, and <ulink
	    url="http://www.mozilla.org/">Firefox</ulink> 0.9.6 and
	  above handle XSLT inside the browser (MSIE5.5 needs some
	  <ulink
	    url="http://www.netcrucible.com/xslt/msxml-faq.htm">post-installation 
	    surgery</ulink> to remove the obsolete WD-xsl and replace
	  it with the current XSL-Transform processor; MSIE6 and
	  Firefox work as installed).</para>
	<tip>
	  <title>WYSIWYG for XSL</title>
	  <para>There have been attempts to produce pseudo-WYSIWYG
	    editors for creating XSL[T] stylesheets, but they have
	    mostly been restricted to simple mapping between input
	    elements and output elements (eg a DocBook
	    <sgmltag>para</sgmltag> to a HTML <sgmltag>p</sgmltag>).
	    Anything beyond this seems likely to fail because of the
	    infinite complexity of what people want to do with their
	    information. If you have access to the ACM database, see
	    the <ulink
	      url="http://portal.acm.org/citation.cfm?id=502189">paper
	      by Pietriga, Vion-Dury, and Quint on VXT</ulink>, from
	    the ACM DocEng'01 (Atlanta) Proceedings.</para>
	</tip>
	<tip>
	  <title>Generating HTML on the server</title>
	  <para>There is a growing use of server-side processors like
	    <ulink url="http://cocoonapache.org/">Cocoon</ulink>,
	    <ulink url="http://axkit.org/">AxKit</ulink>, <ulink
	      url="http://www.propylon.com/products/propelx/">PropelX</ulink>, 
	    and others, which let you create, store, and manage your
	    information in XML but serve it auto-converted to HTML or
	    some other format, thus allowing the output to be used by
	    any browser. XSLT is also widely used to transform XML
	    into non-SGML formats for input to other systems (for
	    example to transform XML into <LaTeX/> for
	    typesetting).</para>
	</tip>
	<tip id="TeX">
	  <title>Alternatives to XSL:FO</title>
	  <para>Instead of generating PDF via
	    an FO processor, it is possible to use XSLT to transform
	    XML to <LaTeX/> for typesetting PDF (as is done for the
	    print versions of this FAQ, from DocBook to
	    <LaTeX/>). This has the advantage of being able to make
	    use of <LaTeX/>'s extensive library of prewritten
	    formatting modules (<quote>packages</quote>), which avoids
	    much of the wheel-reinventing currently required with
	    XSL:FO.</para>
	  <para>Alternatively, <personname>
	      <firstname>David</firstname>
	      <surname>Carlisle</surname>
	    </personname>'s <productname>xmltex</productname> reads
	    XML directly, offering another practical if experimental
	    solution to typesetting XML. One use of a <TeX/> system
	    that can typeset XML files is as a backend processor for
	    XSL:FO, serialized as XML. <personname>
	      <firstname>Sebastian</firstname>
	      <surname>Rahtz</surname>
	    </personname>'s Passive<TeX/> uses
	    <productname>xmltex</productname> to achieve this
	    end.</para>
	  <para id="faq:TeX">The <TeX/> FAQ is at <ulink id="FAQ:tex"
	      url="http://www.tex.ac.uk/faq"></ulink>.</para>
	</tip>
	<para>SGML systems used a similar stylesheet mechanism: some
	  of the common ones were the FOSI (Formatted Output
	  Specification Instance), which was standard in defence and
	  industrial engineering applications, especially when using
	  the Arbortext editor (Adept, now Epic); the DynaText/DynaWeb
	  stylesheet used in SGML publishing to the web; and the Synex
	  stylesheet used in browsers based on the Synex engine (eg
	  Panorama, whose styling interface was partly adopted in
	  XMetaL), the expertise of whose designers persists in the
	  DocZilla browser.</para>
      </answer>
    </qandaentry>
    <qandaentry id="graphics" remap="FAQ-GRAPH, graph">
      <question>
	<formalpara>
	  <title>How do I use graphics in XML?</title>
	  <para>Reference them as for HTML or use XLink. Or embed
	    SVG.</para>
	</formalpara>
      </question>
      <answer remap="drawings sounds nonparsed raster images">
	<para>Graphics have traditionally just been links which happen
	  to have a picture file at the end rather than another piece
	  of text. They can therefore be implemented in any way
	  supported by the XLink and XPointer specifications (see
	  <link linkend="links"></link>),
	  including using similar syntax to existing HTML images. They
	  can also be referenced using XML's built-in
	  <sgmltag>NOTATION</sgmltag> and <sgmltag>ENTITY</sgmltag>
	  mechanism in a similar way to standard SGML, as external
	  unparsed entities.</para>
	<para>However, the SVG specification (see <link
	    linkend="svg"></link>) lets you use XML markup to
	  draw vector graphics objects directly in your XML file. This
	  provides enormous power for the inclusion of portable
	  graphics, especially interactive or animated sequences, and
	  it is now slowly becoming supported in browsers.</para>
	<para>The XML linking specifications for external images give
	  you much better control over the traversal and activation of
	  links, so an author can specify, for example, whether or not
	  to have an image appear when the page is loaded, or on a
	  click from the user, or in a separate window, without having
	  to resort to scripting.</para>
	<para>XML itself doesn't predicate or restrict graphic file
	  formats: GIF, JPG, TIFF, PNG, CGM, EPS, and SVG at a minimum
	  would seem to make sense; however, vector formats (EPS, SVG)
	  are normally essential for non-photographic images
	  (diagrams).</para>
	<para>You cannot embed a raw binary graphics file (or any
	  other binary [non-text] data) directly into an XML file
	  because any bytes happening to resemble markup would get
	  misinterpreted: you must refer to it by linking (see below).
	  It is, however, possible to include a text-encoded
	  transformation of a binary file as a CDATA Marked Section,
	  using something like UUencode with the markup characters
	    <literal>]</literal>, <literal>&amp;</literal> and <literal>></literal>
	  removed from the map so that they could not occur as an
	  erroneous CDATA termination sequence and be misinterpreted.
	  You could even use simple hexadecimal encoding as used in
	  PostScript. For vector graphics, however, the solution is to
	  use SVG (see <link linkend="svg"></link>).</para>
	<para>Sound files are binary objects in the same way that
	  external graphics are, so they can only be referenced
	  externally (using the same techniques as for graphics).
	  Music files written in MusiXML or an XML variant of SMDL
	  could however be embedded in the same way as for SVG.</para>
	<para>The point about using entities to manage your graphics
	  is that you can keep the list of entity declarations
	  separate from the rest of the document, so you can re-use
	  the names if an image is needed more than once, but only
	  store the physical file specification in a single place.
	  This is available only when using a DTD, not a
	  Schema.</para>
	<tip xreflabel="Bob DuCharme">
	  <para>All the data in an XML document entity must be
	    parsable XML. You can define an external entity as either
	    a parsed entity (parsable XML) or an unparsed entity
	    (anything else). Unparsed entities can be used for picture
	    files, sound files, movie files, or whatever you like.
	    They can only be referenced from within a document as the
	    value of an attribute (much like a bitmap picture on an
	    HTML Web page is the value of the <sgmltag>img</sgmltag>
	    element's <sgmltag>src</sgmltag> attribute) and not part
	    of the actual document. In an XML document, this attribute
	    must be declared to be of type <sgmltag>ENTITY</sgmltag>,
	    and the entity's declaration must specify a declared
	    <sgmltag>NOTATION</sgmltag>, because if the entity isn't
	    XML, the XML processor needs to know what it is. For
	    example, in the following document, the
	    <sgmltag>colliepic</sgmltag> entity is declared to have a
	    JPEG notation, and it's used as the value of the empty dog
	    element's <sgmltag class="attribute">picfile</sgmltag>
	    attribute.</para>
	  <programlisting><![CDATA[ 
<?xml version="1.0"?> 
<!DOCTYPE dog [ 
<!NOTATION JPEG SYSTEM "Joint Photographic Experts Group"> 
<!ENTITY colliepic SYSTEM "lassie.jpg" NDATA JPEG>
<!ELEMENT dog EMPTY> 
<!ATTLIST dog picfile ENTITY #REQUIRED> 
]> 
<dog picfile="colliepic"/> 
	    ]]></programlisting>
	  <para>The Entity method is particularly useful when you have
	    many images, or many repeated uses of the same images,
	    because you only declare them once, at the top of the
	    document, making image management much easier.</para>
	  <para>The XLink and XPointer linking specifications describe
	    other ways to point to a non-XML file such as a graphic.
	    These offer more sophisticated control over the external
	    entity's position, handling, and appearance within the XML
	    document.</para>
	</tip>
	<tip id="svg" xreflabel="Peter Murray-Rust">
	  <para>GIFs and JPEGs cater for bitmaps (pixel
	    representations of images: all made up of coloured dots).
	    Vector graphics (scalable, made up of drawing
	    specifications) are addressed in the W3C's graphics
	    activity as Scalable Vector Graphics (see <ulink
	      url="http://www.w3.org/Graphics/SVG"></ulink>). 
	    With the specification now complete, it is
	    possible to transmit the graphical representation as
	    vectors directly within the XML file. For many graphics
	    objects this will mean greatly decreased download time and
	    scaling without loss of detail.</para> 
	</tip>
	<tip xreflabel="Max Dunn">
	  <para id="faq:SVG">SVG has really taken off recently, and is
	    quite an XML success story [&hellip;] there are already
	    nearly conformant implementations. We recently started an
	    SVG FAQ at <ulink id="FAQ:svg"
	      url="http://www.siliconpublishing.org/svgfaq/"></ulink> 
	    which we are planning to move to <ulink
	      url="http://www.svgfaq.com/"></ulink>.</para>
	  <para>XSLT can be used to generate SVG from XML; details are
	    at <ulink
	      url="http://www.siliconpublishing.org/svgfaq/XSLT.asp"></ulink> 
	    (be careful to use XSLT, not <ulink
	      url="http://www.netcrucible.com/xslt/msxml-faq.htm">Microsoft's 
	      obsolete WD-xsl</ulink>). Documents can also interact
	    with SVG images (see <ulink
	      url="http://www.xml.com/pub/a/2000/03/22/style/index.html">http://www.xml.com/pub/a/2000/03/22/style/index.html</ulink>).</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="parsers" >
      <question>
	<formalpara>
	  <title>What is parsing and how do I do it in XML?</title>
	  <para>Parsing is splitting up information into its component
	    parts</para>
	</formalpara>
      </question>
      <answer remap="penguins">
	<para>Parsing is the act of splitting up information into its
	  component parts (schools used to teach this in language
	  classes until the teaching profession collectively caught
	  the anti-grammar disease).</para>
	<para><quote>Mary feeds Spot</quote> parses as</para>
	<orderedlist>
	  <listitem>
	    <para>Subject = Mary, proper noun, nominative case</para>
	  </listitem>
	  <listitem>
	    <para>Verb = feeds, transitive, third person singular,
	      present tense</para>
	  </listitem>
	  <listitem>
	    <para>Object = Spot, proper noun, accusative case</para>
	  </listitem>
	</orderedlist>
	<para>In computing, a parser is a program (or a piece of code
	  or API that you can reference inside your own programs)
	  which analyses files to identify the component parts. All
	  applications that read input have a parser of some kind,
	  otherwise they'd never be able to figure out what the
	  information means. Microsoft Word contains a parser which
	  runs when you open a <programlisting>.doc</programlisting>
	  file and checks that it can identify all the hidden codes.
	  Give it a corrupted file and you'll get an error
	  message.</para>
	<para>XML applications are just the same: they contain a parser
	  which reads XML and identifies the function of each the pieces of
	  the document, and it then makes that information available in
	  memory to the rest of the program.</para>
	<para>While reading an XML file, a parser checks the syntax
	  (pointy brackets, matching quotes, etc) for well-formedness,
	  and reports any violations (reportable errors). The <link
	    xreflabel="simple" linkend="spec">XML Specification</link>
	  lists what these are.</para>
	<para>Validation is another stage beyond parsing. As the
	  component parts of the program are identified, a validating
	  parser can compare them with the pattern laid down by a DTD
	  or a Schema, to check that they conform. In the process,
	  default values and datatypes (if specified) can be added to
	  the in-memory result of the validation that the validating
	  parser gives to the application.</para>
	<programlisting><![CDATA[
<person corpid="abc123" 
        birth="1960-02-31" 
        gender="female">
  <name>
    <forename>Judy</forename>
    <surname>O'Grady</surname>
  </name>
</person> 
	  ]]></programlisting>
	<para>The example above parses as:</para>
	<orderedlist>
	  <listitem>
	    <para>Element <sgmltag class="gi">person</sgmltag>
	      identified with Attribute <sgmltag
		class="attribute">corpid</sgmltag> containing <sgmltag
		class="attvalue">abc123</sgmltag> and Attribute
	      <sgmltag class="attribute">birth</sgmltag> containing
	      <sgmltag class="attvalue">1960-02-31</sgmltag> and
	      Attribute <sgmltag class="attribute">gender</sgmltag>
	      containing <sgmltag class="attvalue">female</sgmltag>
	      containing ...</para>
	  </listitem>
	  <listitem>
	    <para>Element <sgmltag class="gi">name</sgmltag>
	      containing ...</para>
	  </listitem>
	  <listitem>
	    <para>Element <sgmltag class="gi">forename</sgmltag>
	      containing text <quote>Judy</quote> followed by
	      ...</para>
	  </listitem>
	  <listitem>
	    <para>Element <sgmltag class="gi">surname</sgmltag>
	      containing text <quote>O'Grady</quote></para>
	  </listitem>
	</orderedlist>
	<para>(and lots of other stuff too).</para>
	<para>As well as built-in parsers, there are also stand-alone
	  parser-validators, which read an XML file and tell you if
	  they find an error (like missing angle-brackets or quotes,
	  or misplaced markup). This is essential for testing files in
	  isolation before doing something else with them, especially
	  if they have been created by hand without an XML editor, or
	  by an API which may be too deeply embedded elsewhere to
	  allow easy testing.</para>
	<tip id="howval" xreflabel="Bill Rayer" role="helped">
	  <para>For standalone parsing/validation use software like
	    <personname>
	      <firstname>James</firstname>
	      <surname>Clark</surname>
	    </personname>'s <ulink
	      url="http://www.jclark.com/sp">nsgmls</ulink> or
	      <personname>
	      <firstname>Richard</firstname>
	      <surname>Tobin</surname>
	    </personname>'s <ulink
	      url="http://www.cogsci.ed.ac.uk/~richard/rxp.html">rxp</ulink>. 
	    Both work under Linux and Windows/DOS. The difference is
	    in the format of the error listing (if any), and that some
	    versions of nsgmls do not retrieve DTDs or other files
	    over the network, whereas rxp does.</para>
	  <para>Make sure your XML file correctly references its DTD
	    in a Document Type Declaration, and that the DTD file[s]
	    are locally accessible (rxp will retrieve them if you have
	    an Internet connection; nsgmls may not, so it may need a
	    local copy).</para>
	  <para>Download and install the software. Make sure it is
	    installed to a location where your operating system can
	    find it. If you don't know what any of this means, you
	    will need some help from someone who knows how to download
	    and install software on your type of operating
	    system.</para>
	  <para>For nsgmls, copy <filename>pubtext/xml.soc</filename>
	    and <filename>pubtext/xml.dcl</filename> to your working
	    directory.</para>
	  <para>To validate <filename>myfile.xml</filename>, open a
	    shell window (Linux) or an MS-DOS (<quote>command</quote>)
	    window (Microsoft Windows). In these examples we'll assume
	    your XML file is called <filename>myfile.xml</filename>
	    and it's in a folder called <filename>myfolder</filename>.
	    Use the real names of your folder and file when you type
	    the commands.</para>
	  <variablelist>
	    <varlistentry>
	      <term>For nsgmls:</term>
	      <listitem>
		<para><programlisting><![CDATA[
$ nsgmls -wxml -wundefined -cxml.soc -s myfile.xml
		  ]]></programlisting>
		There are many other options for nsgmls which
		  are described on the <ulink
		    url="http://www.jclark.com/sp">Web page</ulink>.
		  The ones given here are required because it's an
		  SGML parser and these options switch it to XML mode
		  and suppress the normal output, leaving just the
		  errors (if any).</para>
		<para>(In Microsoft Windows you may have to prefix the
		  nsgmls with the full path to wherever it was
		  installed, eg <filename>C:\Program
		    Files\nsgmls\nsgmls</filename>).</para>
	      </listitem>
	    </varlistentry>
	    <varlistentry>
	      <term>For rxp:</term>
	      <listitem>
		<para><programlisting><![CDATA[
$ rxp myfile.xml
		  ]]></programlisting>
		Rxp also has some options which are described on
		  its <ulink
		    url="http://www.cogsci.ed.ac.uk/~richard/rxp.html">Web 
		    page</ulink>.</para>
		<para>(In Microsoft Windows you may have to prefix the
		  rxp with the full path to wherever it was installed,
		  eg <filename>C:\Program
		    Files\rxp\rxp</filename>).</para>
	      </listitem>
	    </varlistentry>
	  </variablelist>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="includes">
      <question>
	<formalpara>
	  <title>How do I include one XML file in another?</title>
	  <para>Use a general entity, same as for SGML</para>
	</formalpara>
      </question>
      <answer>
	<para>This works exactly the same as for SGML. First you
	  declare the entity you want to include, and then you
	  reference it by name:</para>
	<programlisting><![CDATA[ 
<?xml version="1.0"?>
<!DOCTYPE novel SYSTEM "/dtd/novel.dtd" [
<!ENTITY chap1 SYSTEM "mydocs/chapter1.xml">
<!ENTITY chap2 SYSTEM "mydocs/chapter2.xml">
<!ENTITY chap3 SYSTEM "mydocs/chapter3.xml">
<!ENTITY chap4 SYSTEM "mydocs/chapter4.xml">
<!ENTITY chap5 SYSTEM "mydocs/chapter5.xml">
]>
<novel>
  <header>
    ...blah blah...
  </header>
&chap1; 
&chap2; 
&chap3; 
&chap4; 
&chap5; 
</novel>
	  ]]></programlisting>
	<para>The difference between this method and the one used for
	  including a DTD fragment (see <link
	    linkend="dtdincludes"></link>) is that this uses an
	  external general (file) entity which is referenced in the
	  same way as for a character entity (with an ampersand).</para>
	<para>The one thing to make sure of is that the included file
	  <emphasis>must not</emphasis> have an XML or DOCTYPE
	  Declaration on it. If you've been using one for editing the
	  fragment, remove it before using the file in this way. Yes,
	  this is a pain in the butt, but if you have lots of
	  inclusions like this, write a script to strip off the
	  declaration (and paste it back on again for editing).</para>
      </answer>
    </qandaentry>
    <qandaentry id="cdata" >
      <question>
	<formalpara>
	  <title>When should I use a CDATA Marked Section?</title>
	  <para>CDATA is only for text containing markup-like characters.</para>
	</formalpara>
      </question>
      <answer>
	<para>You should almost never need to use CDATA Sections. The
	  CDATA mechanism was designed to let an author quote
	  fragments of text containing markup characters (the
	  open-angle-bracket and the ampersand), for example when
	  documenting XML (this FAQ uses CDATA Sections quite a lot,
	  for obvious reasons). A CDATA Section turns off markup
	  recognition for the duration of the section (it gets turned
	  on again only by the closing sequence of double
	  end-square-brackets and a close-angle-bracket).</para>
	<para>Consequently, <emphasis>nothing</emphasis> in a CDATA
	  section can ever be recognised as anything to do with
	  markup: it's just a string of opaque characters, and if you
	  use an XML transformation language like XSLT, <emphasis>any
	    markup characters in it will get turned into their
	    character entity equivalents</emphasis>.</para>
	<para>If you try, for example, to use:</para>
	<programlisting><![CDATA[
some text with <![CDATA[<em>markup</em>]]>]&#x005D;> in it.
	</programlisting>
	<para>in the expectation that the embedded markup would remain
	  untouched, it won't: it will just output</para>
	<programlisting>
some text with &amp;lt;em>markup&amp;lt;/em> in it.
	</programlisting>
	<para>In other words, CDATA Sections
	  <emphasis>cannot</emphasis> preserve the embedded markup
	  <emphasis>as markup</emphasis>. Normally this is exactly
	  what you want because this technique was designed to let
	  people do things like write documentation about markup. It
	  was <emphasis>not</emphasis> designed to allow the passing
	  of little chunks of (possibly invalid) unparsed HTML
	  embedded inside your own XML through to a subsequent
	  process&mdash;because that would risk invalidating the
	  output.</para>
	<para>As a result you <emphasis>cannot</emphasis> expect to
	  keep markup untouched simply because it looked as if it was
	  safely <quote>hidden</quote> inside a CDATA section: it
	  can't be used as a magic shield to preserve HTML markup for
	  future use <emphasis>as markup</emphasis>, only as
	  characters.</para>
	<tip>
	  <para>Read <link linkend="html"></link> as
	    well, which is very closely related.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="html" >
      <question>
	<formalpara>
	  <title>How can I handle embedded HTML in my XML?</title>
	  <para>Provide for it in the output, use a deep copy, or try
	    disable-output-escaping.</para>
	</formalpara>
      </question>
      <answer>
	<para>Apart from using <link linkend="cdata" xreflabel="simple">CDATA
	    Sections</link>, there are two common occasions when
	  people want to handle embedded HTML inside an XML
	  element:</para>
	<orderedlist>
	  <listitem>
	    <para>when they have received (possibly poorly-designed)
	      XML from somewhere else which they must find a way to
	      handle;</para>
	  </listitem>
	  <listitem>
	    <para>when they have an application which has been
	      explicitly designed to store a string of characters
	      containing <literal><![CDATA[&lt;]]></literal> and
	      <literal><![CDATA[&amp;]]></literal> character entity
	      references with the objective of turning them back into
	      markup in a later process (eg FreeMind, Atom).</para>
	  </listitem>
	</orderedlist>
	<para>Generally, you want to avoid this kind of trick, as it
	  usually indicates that the document structure and design has
	  been insufficiently thought out. However, there are
	  occasions when it becomes unavoidable, so if you really need
	  or want to use embedded HTML markup inside XML,
	  <emphasis>and</emphasis> have it processable later as
	  markup, there are a couple of techniques you may be able to
	  use:</para>
	<itemizedlist>
	  <listitem>
	    <para>Provide templates for the handling of that markup in
	      your XSLT transformation or whatever software you use
	      which simply replicates what was there, eg</para>
	    <programlisting><![CDATA[
<xsl:template match="b">
  <b>
    <xsl:apply-templates/>
  </b>
</xsl:template/>
	      ]]></programlisting>
	  </listitem>
	  <listitem>
	    <para>Use XSLT's <quote>deep copy</quote> instruction,
	      which outputs nested well-formed markup verbatim,
	      eg</para>
	    <programlisting><![CDATA[
<xsl:template match="ol">
  <xsl:copy-of select="."/>
</xsl:template/>
	      ]]></programlisting>
	  </listitem>
	  <listitem>
	    <para>As a last resort, use the
	      <sgmltag>disable-output-escaping</sgmltag> attribute on
	      the <sgmltag>xsl:text</sgmltag> element of XSL[T] which
	      is available in some processors, eg</para>
	    <programlisting><![CDATA[
<xsl:text disable-output-escaping="yes">
  <![CDATA[<b>Now!</b>]]>&#x005D;]>
<![CDATA[</xsl:text>
	      ]]></programlisting>
	  </listitem>
	  <listitem>
	    <para>Some processors (eg JX) are now providing their own
	      equivalents for disabling output escaping. Their
	      proponents claim it is <quote>highly desirable</quote>
	      or <quote>what most people want</quote>, but it still
	      needs to be treated with care to prevent unwanted
	      (possibly dangerous) arbitrary code from being passed
	      untouched through your system. It also adds another
	      dependency to your software.</para>
	  </listitem>
	</itemizedlist>
	<para>For more details of using these techniques in XSL[T],
	  see <ulink
	    url="http://www.dpawson.co.uk/xsl/sect2/cdata.html">the
	    relevant question in the XSL FAQ</ulink>.</para>
	<tip>
	  <para>Read <link linkend="cdata"></link> as
	    well, which is very closely related.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="specials" >
      <question>
	<formalpara>
	  <title>What are the special characters in XML?</title>
	  <para>Just five: <literal><![CDATA[&lt;]]></literal> (<literal>&lt;</literal>),
	    <literal><![CDATA[&amp;]]></literal> (<literal>&amp;</literal>),
	  <literal><![CDATA[&gt;]]></literal> (<literal>&gt;</literal>),
	  <literal><![CDATA[&quot;]]></literal> (<literal>&quot;</literal>), and
	  <literal><![CDATA[&apos;]]></literal> (<literal>&apos;</literal>)</para> 
	</formalpara>
      </question>
      <answer remap="hex code hexcode specials reserved words nbsp">
	<para>For normal text (<emphasis>not</emphasis> markup), there
	  are no special characters: just make sure your document
	  refers to the correct encoding scheme for the language
	  and/or writing system you want to use,
	  <emphasis>and</emphasis> that your computer correctly stores
	  the file using that encoding scheme. See <link xreflabel="simple"
	    linkend="characters">the question on non-Latin
	    characters</link> for a longer explanation.</para>
	<para>If your keyboard will not allow you to type the
	  characters you want, or if you want to use characters
	  outside the limits of the encoding scheme you have chosen,
	  you can use a symbolic notation called <quote>entity
	    referencing</quote>. Entity references can either be
	  <emphasis>numeric</emphasis>, using the decimal or
	  hexadecimal <ulink
	    url="http://www.unicode.org/">Unicode</ulink> code point
	  for the character (eg if your keyboard has no Euro symbol
	  (&euro;) you can type <literal><![CDATA[&#8364;]]></literal>); or
	  they can be <emphasis>character</emphasis>, using an
	  established name which you declare in your DTD (eg
	  <literal><![CDATA[<!ENTITY euro "&#8364;">]]></literal>) and then
	  use as <literal><![CDATA[&euro;]]></literal> in your document. If
	  you are using a Schema, you must use the numeric form for
	  all except the five below because Schemas have no way to
	  make character entity declarations.</para>
	<para>If you use XML with no DTD, then these five character
	  entities are assumed to be predeclared, and you can use them
	  without declaring them:</para>
	<variablelist>
	  <varlistentry>
	    <term><literal><![CDATA[&lt;]]></literal></term>
	    <listitem>
	      <para>The less-than character (<literal>&lt;</literal>) starts
		<firstterm>element markup</firstterm> (the first
		character of a start-tag or an end-tag).</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><literal><![CDATA[&amp;]]></literal></term>
	    <listitem>
	      <para>The ampersand character (<literal>&amp;</literal>)
		starts <firstterm>entity markup</firstterm> (the first
		character of a character entity reference).</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><literal><![CDATA[&gt;]]></literal></term>
	    <listitem>
	      <para>The greater-than character (<literal>&gt;</literal>)
		ends a start-tag or an end-tag.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><literal><![CDATA[&quot;]]></literal></term>
	    <listitem>
	      <para>The double-quote character (<literal>&quot;</literal>)
		can be symbolised with this character entity reference
		when you need to embed a double-quote inside a string
		which is already double-quoted.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><literal><![CDATA[&apos;]]></literal></term>
	    <listitem>
	      <para>The apostrophe or single-quote character
		(<literal>&apos;</literal>) can be symbolised with this
		character entity reference when you need to embed a
		single-quote or apostrophe inside a string which is
		already single-quoted.</para>
	    </listitem>
	  </varlistentry>
	</variablelist>
	<para>If you are using a DTD then you
	  <emphasis>must</emphasis> declare <emphasis>all</emphasis>
	  the character entities you need to use (if any),
	  <emphasis>including</emphasis> any of the five above that
	  you plan on using (they cease to be predeclared if you use a
	  DTD). If you are using a Schema, you must use the numeric
	  form for all except the five above because Schemas have no
	  way to make character entity declarations.</para>
	<warning>
	  <para>There are circumstances where you can use special
	    characters as themselves, such as in <link
	      xreflabel="simple" linkend="cdata">CDATA
	      Sections</link>. Most control characters are prohibited
	    in XML: see the <link xreflabel="simple"
	      linkend="spec">Specification</link> for exact
	    details.</para>
	</warning>
	<para>There are no reserved words as such in the user
	  namespace of XML: you can call an element
	  <wordasword>element</wordasword> and an attribute
	  <wordasword>attribute</wordasword> and so on as in the
	  following (ludicrous) example:</para>
	<programlisting><![CDATA[
<?xml version="1.0"?>
<!DOCTYPE DOCTYPE SYSTEM "SYSTEM" [
<!ELEMENT DOCTYPE (ELEMENT+)>
<!ATTLIST ELEMENT ATTLIST ENTITY #IMPLIED>
<!NOTATION DOCTYPE SYSTEM "ENTITY">
<!ENTITY NOTATION SYSTEM "ENTITY" NDATA DOCTYPE>
]>
<DOCTYPE>
  <ELEMENT ATTLIST="NOTATION">bar</ELEMENT>
</DOCTYPE>
	]]></programlisting>
	<para>where the file <filename>SYSTEM</filename> contains the
	  declaration: <literal><![CDATA[<!ELEMENT ELEMENT
	    (#PCDATA)>]]></literal> and the file
	  <filename>ENTITY</filename> does not even exist.</para>
	<para>There are <firstterm>keywords</firstterm> like
	  <literal>DOCTYPE</literal> and <literal>IMPLIED</literal>
	which are reserved Names, but they are prefixed by a flag
	character (the Markup Declaration Open character or the
	Reserved Name Indicator) so that they cannot be confused with
	user-specified Names.</para>
      </answer>
    </qandaentry>
  </qandadiv>
  <qandadiv id="developers" remap="FAQ-DEVELOPER, Developer">
    <title>Developers and Implementors (including WebMasters and
      server operators)</title>
    <qandaentry remap="FAQ-SPEC, spec" id="spec">
      <question>
	<formalpara>
	  <title>Where's the spec?</title>
	  <para>Right <ulink
	    url="http://www.w3.org/TR/REC-xml">here</ulink></para>
	</formalpara>
      </question>
      <answer>
	<para>Right here: <xref linkend="thespec"/>
	  (<filename>http://www.w3.org/TR/REC-xml</filename>).
	  Includes the EBNF, and all the normative material. There are
	  also versions in <ulink
	    url="http://www.fxis.co.jp/DMS/sgml/xml/">Japanese</ulink>; 
	  <ulink
	    url="http://xml.silmaril.ie/faq-es.html">Spanish</ulink>;
	  <ulink
	    url="http://xml.t2000.co.kr/faq/index.html">Korean</ulink>; 
	  a <ulink
	    url="http://www.xml.com/axml/testaxml.htm">Java-ised
	    annotated version</ulink>, and <author of="xmlann">
	    <contrib></contrib>
	  </author>'s book, <citetitle
	  author="xmlann"></citetitle>.</para>
	<para><personname>
	    <firstname>Eve</firstname>
	    <surname>Maler</surname>
	  </personname> maintains <ulink
	    url="http://www.w3.org/XML/1998/06/xmlspec-v21.dtd">the
	    DTD used for the spec itself</ulink>; the DTD is also to
	  encode several other W3C specifications, such as XLink,
	  XPointer, DOM, XML Schema, etc. There is <ulink
	    url="http://www.w3.org/XML/1998/06/xmlspec-report-v21.htm">documentation</ulink> 
	  available for the DTD. Note that the XML spec needs to use
	  <ulink
	    url="http://www.w3.org/XML/1998/06/xmlspec-v21a.dtd">a
	    special one-off version of the DTD</ulink>, since the real
	  original DTD used for it has long since been lost.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-VALIDWF, validwf" id="validity">
      <question>
	<formalpara>
	  <title>What are these terms DTDless, valid, and
	    well-formed?</title>
	  <para>Well-formed means syntactically correct (DTD or not);
	    valid means a DTD has been used.</para>
	</formalpara>
      </question>
      <answer remap="internalsubset well from formed modelling modeling">
	<para>XML lets you use a Schema or Document Type Definition
	  (DTD) to describe the markup (elements and other constructs)
	  available in any specific type of document. However, the
	  design and construction of Schemas and DTD can be complex
	  and non-trivial, so XML also lets you work without one.
	  DTDless operation means you can invent markup without having
	  to define it formally, provided you stick to the rules of
	  XML syntax.</para>
	<para>To make this work, a DTDless file is assumed to define
	  its own markup purely by the existence and location of
	  elements where you create them. When an XML application
	  encounters a DTDless file, it builds its internal model of
	  the document structure while it reads it, because it has no
	  Schema or DTD to tell it what to expect. There must
	  therefore be no surprises or ambiguous syntax. To achieve
	  this, the document must be <quote>well-formed</quote> (must
	  follow the rules).</para>
	<para>To understand why this concept is needed, look at
	  standard HTML as an example:</para>
	<itemizedlist>
	  <listitem>
	    <para>The <sgmltag class="gi">img</sgmltag> element
	      is declared (in the DTDs for HTML) as EMPTY, so it
	      doesn't have an end-tag (there is no such thing as
	      <sgmltag class="endtag">img</sgmltag>);</para>
	  </listitem>
	  <listitem>
	    <para>Many other HTML elements (such as <sgmltag
		class="gi">para</sgmltag>) allow you to omit the
	      end-tag for brevity when using the SGML version of
	      HTML.</para>
	  </listitem>
	  <listitem>
	    <para>If an XML processor reads an HTML file without
	      knowing this (because it isn't using a DTD), and it
	      encounters an <sgmltag class="starttag">img</sgmltag> or
	      a <sgmltag class="starttag">para</sgmltag> (or any other
	      start-tag), it would have no way to know whether or not
	      to expect an end-tag. This makes it impossible to know
	      if the rest of the file is correct or not, because it
	      has now no evidence of whether it is inside an element
	      or if it has finished with it.</para>
	  </listitem>
	</itemizedlist>
	<para>Well-formed documents therefore
	  <emphasis>require</emphasis> start-tags and end-tags on
	  every normal element, and any EMPTY elements must be made
	  unambiguous, either by using normal start-tags and end-tags,
	  or by appending a slash to the name of the start-tag before
	  the closing <literal>></literal> as a sign that there will be no
	  separate end-tag.</para> 
	<para>All XML documents, both DTDless and valid, must be
	  well-formed. They must start with an XML Declaration if
	  necessary (for example, identifying the character encoding
	  or using the Standalone Document Declaration):</para> 
	<programlisting><![CDATA[
<?xml version="1.0" encoding="iso-8859-1" 
      standalone="yes"?> 
<foo>
  <bar>...<blort/>...</bar> 
</foo> 
	  ]]></programlisting>
	<tip xreflabel="David Brownell"> 
	  <para>XML that's just well-formed doesn't need to use a
	    Standalone Document Declaration at all. Such declarations
	    are there to permit certain speedups when processing
	    documents while ignoring external parameter
	    entities&mdash;basically, you can't rely on external
	    declarations in standalone documents. The types that are
	    relevant are entities and attributes. Standalone documents
	    must not require any kind of attribute value normalisation
	    or defaulting, otherwise they are invalid.</para> 
	</tip>
	<para>It's also possible to use a Document Type Declaration
	  with DTDless files, even though there is no Document Type to
	  refer to: </para>
	<tip xreflabel="Richard Lander">
	  <para>If you need character entities [other than the five
	    built-in ones] in a DTDless file, you can declare them in
	    an internal subset without referencing anything other than
	    the root element type:</para> 
	  <programlisting><![CDATA[ 
<?xml version="1.0" standalone="yes"?> 
<!DOCTYPE example [ 
<!ENTITY mdash "---"> 
]> 
<example>Hindsight&mdash;a wonderful 
thing.</example> 
	    ]]></programlisting>
	</tip>
	<tip id="wf">
	  <title>Rules for well-formedness:</title>
	  <itemizedlist>
	    <listitem>
	      <para>All tags must be balanced: that is, every element
		which may contain character data or sub-elements must
		have both the start-tag and the end-tag present
		(omission is not allowed except for EMPTY elements,
		see below);</para>
	    </listitem>
	    <listitem>
	      <para>All attribute values must be in quotes. The
		single-quote character (the apostrophe) may be used if
		the value contains a double-quote character, and vice
		versa. If you need isolated quotes as data as well,
		you can use <sgmltag class="genentity">apos</sgmltag>
		or <sgmltag class="genentity">quot</sgmltag>. Do not
		under any circumstances use the automated typographic
		(<quote>curly</quote>) inverted commas substituted by
		some wordprocessors for quoting attribute
		values.</para>
	    </listitem>
	    <listitem>
	      <para>Any EMPTY elements (eg those with no end-tag like
		HTML's <sgmltag class="gi">img</sgmltag>,
		<sgmltag class="gi">hr</sgmltag>, and <sgmltag
		  class="gi">br</sgmltag> and others) must
		<emphasis>either</emphasis> end with
	      <literal>/></literal>&nbsp;<emphasis>or</emphasis> they must
		look like non-EMPTY elements by having a real end-tag
		(but no content). Example: <sgmltag
		  class="starttag">br</sgmltag> would become either
		<sgmltag class="emptytag">br</sgmltag> or <sgmltag
		  class="starttag">br</sgmltag><sgmltag
		  class="endtag">br</sgmltag> (with nothing in
		between).</para> 
	    </listitem>
	    <listitem>
	      <para>There must not be any isolated markup-start
		characters (<literal><![CDATA[<]]></literal> or
	      <literal><![CDATA[&]]></literal>) in your text data. They must
		be given as <sgmltag class="genentity">lt</sgmltag>
		and <sgmltag class="genentity">amp</sgmltag>
		respectively, and the sequence
		<literal>]]</literal><literal><![CDATA[>]]></literal> may only
		occur as the end of a CDATA marked section: if you are
		using it for any other purpose it must be given as
		<literal>]]</literal><sgmltag
		  class="genentity">gt</sgmltag>.</para>
	    </listitem>
	    <listitem>
	      <para>Elements must nest inside each other properly (no
		overlapping markup, same as for HTML);</para>
	    </listitem>
	    <listitem>
	      <para>DTDless well-formed documents may use attributes
		on any element, but the attributes are all assumed to
		be of type CDATA. You cannot use ID/IDREF attribute
		types for parser-checked cross-referencing in DTDless
		documents.</para>
	    </listitem>
	    <listitem>
	      <para>XML files with no DTD are considered to have
		<sgmltag class="genentity">lt</sgmltag>, <sgmltag
		  class="genentity">gt</sgmltag>, <sgmltag
		  class="genentity">apos</sgmltag>, <sgmltag
		  class="genentity">quot</sgmltag>, and <sgmltag
		  class="genentity">amp</sgmltag> predefined and thus
		available for use. With a DTD, all character entities
		used must be declared, including these five. </para>
	    </listitem>
	  </itemizedlist>
	</tip>
	<tip id="valid">
	  <title>Rules for validity</title>
	  <para>Valid XML files are well-formed files which have a
	    <link linkend="dtds" xreflabel="simple">Document Type
	      Definition (DTD)</link> and which conform to it. They
	    must already be <link xreflabel="simple"
	      linkend="wf">well-formed</link>, so all the rules above
	    apply.</para> 
	  <para>A valid file begins with a Document Type Declaration,
	    but may have an optional XML Declaration prepended:</para>
	  
	  <programlisting><![CDATA[ 
<?xml version="1.0"?> 
<!DOCTYPE advert 
  SYSTEM "http://www.foo.org/ad.dtd"> 
<advert>
  <headline>...<pic/>...</headline> 
  <text>...</text>
</advert> 
	    ]]></programlisting> 
	</tip>
	<para id="fpis">The XML Specification predefines an SGML
	  Declaration for XML which is fixed for all instances and is
	  therefore hard-coded into all XML software and never
	  specified separately (except when using an SGML/XML
	  switchable validator like
	  <productname>onsgmls</productname>: see below).</para>
	<tip id="sgmldec">
	  <para>The SGML Declaration for XML has been
	    removed from the text of the Specification but is
	    available as <ulink
	      url="http://www.w3.org/TR/NOTE-sgml-xml-971215">a
	      separate document</ulink>). As this appears to be
	    suffering from bitrot or neglect, there is a copy <ulink
	      url="/xml.dec_jc">here</ulink> and a version for
	    <productname>onsgmls</productname>&nbsp;<ulink
	      url="/xml.dec_onsgmls">here</ulink>.</para>
	</tip>
	<para>The specified DTD must be
	  accessible to the XML processor using the URI supplied in
	  the SYSTEM Identifier, either by being available locally (ie
	  the user already has a copy on disk), or by being
	  retrievable via the network. Note that DTD specifications
	  <emphasis>must</emphasis> be URIs (local, relative, or
	  absolute). Proprietary-specific filesystem references (eg
	  <filename>C:\dtds\my.dtd</filename> are not URIs and cannot
	  be used: use the <filename>file:///C|/dtds/my.dtd</filename>
	  format instead.</para> 
	<para>It is possible (many people would say preferable) to
	  supply a Formal Public Identifier with the PUBLIC keyword,
	  and use an XML Catalog to dereference it, but the
	  Specification mandates a SYSTEM Identifier so this must
	  still be supplied (after the PUBLIC identifier: no further
	  keyword is needed):</para> 
	<programlisting><![CDATA[ 
<!DOCTYPE advert 
  PUBLIC "-//Foo, Inc//DTD Advertisements//EN"
	 "http://www.foo.org/ad.dtd"> 
<advert>...</advert>
	  ]]></programlisting> 
	<para>The test for validity is that a validating parser finds
	  no errors in the file: it must conform absolutely to the
	  definitions and declarations in the DTD.</para>
	<para>XML (W3C) Schemas are not usually linked directly from
	  within an XML document instance in the way that DTDs are:
	  the relevant Schema (XSD file) for a document instance is
	  normally specified to the parser separately, either by file
	  system reference, or using a <ulink
	    url="http://www.w3.org/TR/xmlschema-0/#NS">Target
	    Namespace</ulink>.</para>
      </answer>
    </qandaentry>
    <qandaentry id="attributes" remap="attriborelem">
      <question>
	<formalpara>
	  <title>Which should I use in my DTD, attributes or
	    elements?</title>
	  <para>See <ulink
	      url="http://xml.coverpages.org/elementsAndAttrs.html"></ulink></para>
	</formalpara>
      </question>
      <answer>
	<para>There is no single answer to this: a lot depends on what
	  you are designing the document type for.</para>
	<para>Traditional editorial practice for normal text documents
	  is to put the real text (what would be printed) as character
	  data content, and keep the metadata (information about the
	  text) in attributes, from where they can more easily be
	  isolated for analysis or special treatment like display in
	  the margin or in a mouseover:</para>
	<programlisting><![CDATA[ 
<l n="184">
  <spara>Portia</spara>
  <text>The quality of mercy is not strain'd,</text>
</l> 
	  ]]></programlisting>
	<para>But from the systems point of view, there is nothing
	  wrong with storing the data the other way round, especially
	  where the volume of text data on each occasion is relatively
	  small:</para>
	<programlisting><![CDATA[ 
<line speaker="Portia" text="The quality of mercy 
is not strain'd,">184</line> 
	  ]]></programlisting>
	<para>A lot will depend on what you want to do with the
	  information and which bits of it are easiest accessed by
	  each method. A rule of thumb for conventional text documents
	  is that if the markup were all stripped away, the bare text
	  should still be readable and usable, even if unformatted and
	  inconvenient. For database output, however, or other
	  machine-generated documents like e-commerce transactions,
	  human reading may not be meaningful, so it is perfectly
	  possible to have documents where all the data is in
	  attributes, and the document contains no character data in
	  content models at all.  See <ulink
	    url="http://xml.coverpages.org/elementsAndAttrs.html"></ulink> 
	  for more information.</para>
	<tip xreflabel="Mike Kay">
	  <para>From a user: <quote><emphasis>[&hellip;] do most of
		you out there use element-based or attribute-based
		xml? why?</emphasis></quote></para>
	  <para>Beginners always ask this question. Those with a
	    little experience express their opinions passionately.
	    Experts tell you there is no right answer. (<ulink
	      url="http://lists.xml.org/archives/xml-dev/200006/msg00293.html"></ulink>)</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="sgmlchanges" remap="FAQ-DTD, dtd">
      <question>
	<formalpara>
	  <title>What else has changed between SGML and XML?</title>
	  <para>Stricter syntax and no options.</para>
	</formalpara>
      </question>
      <answer>
	<para id="restrict">The principal changes are in what you can
	  do in writing a Document Type Definition (DTD). To simplify
	  the syntax and make it easier to write processing software,
	  a large number of SGML markup declaration options have been
	  suppressed (see the <link linkend="dtdconv" xreflabel="simple">list of omitted
	    features</link>).</para>
	<para>An extra Name Start Character is permitted in XML Names
	  (the colon) for use with <link xreflabel="simple"
	    linkend="namespaces">namespaces</link> (enabling DTDs to
	  distinguish element source, ownership, or application).
	  Despite its classification, a colon may only appear in
	  mid-name, <emphasis>not</emphasis> at the start or the
	  end.</para>
      </answer>
    </qandaentry>
    <qandaentry id="namespaces" remap="namespaces">
      <question>
	<formalpara>
	  <title>What's a namespace?</title>
	  <para>A named DTD/Schema fragment identified by a URI
	    (URL).</para>
	</formalpara>
      </question>
      <answer>
	<tip xreflabel="Randall Fowle">
	  <para>A namespace is a collection of element and attribute
	    names identified by a Uniform Resource Identifier
	    reference. The reference may appear in the root element as
	    a value of the <sgmltag class="attribute">xmlns</sgmltag>
	    attribute. For example, the namespace reference for an XML
	    document with a root element type <sgmltag
	      class="gi">x</sgmltag> might appear like
	    this:</para>
	  <programlisting><![CDATA[
<x xmlns="http://www.company.com/company-schema">
	    ]]></programlisting>
	  <para>More than one namespace may appear in a single XML
	    document, to allow a name to be used more than once. Each
	    reference can declare a prefix to be used by each name, so
	    the previous example might appear as</para>
	  <programlisting><![CDATA[
<x xmlns:spc=
   "http://www.company.com/company-schema">
	    ]]></programlisting> 
	  <para>which would nominate the namespace for the
	    <quote>spc</quote> prefix:</para>
	  <programlisting><![CDATA[
<spc:name>Mr. Big</spc:name>
	    ]]></programlisting>
	</tip>
	<tip xreflabel="James Anderson">
	  <para>In general, note that the binding may also be effected
	    by a default value for an attribute in the DTD.</para>
	  <para id="faq:Namespaces">The reference does not need to be
	    a physical file; it is simply a way to distinguish between
	    namespaces. The reference should tell a person looking at
	    the XML document where to find definitions of the element
	    and attribute names using that particular namespace.
	    <personname>
	      <firstname>Ronald</firstname>
	      <surname>Bourret</surname>
	    </personname> maintains the Namespace FAQ at <ulink
	      id="FAQ:namespaces"
	      url="http://www.rpbourret.com/xml/NamespacesFAQ.htm">http://www.rpbourret.com/xml/NamespacesFAQ.htm</ulink>.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-XMLSOFT, xmlsoft" id="software">
      <question>
	<formalpara>
	  <title>What XML software is available?</title>
	  <para>Thousands of programs: too many to list here.</para>
	</formalpara>
      </question>
      <answer remap="vb5 vb6 visual basic">
	<para>Hundreds, possibly thousands, of programs. Details are
	  no longer listed in this FAQ as they are now too many and
	  are changing too rapidly to be kept up to date: see the XML
	  Web pages at <ulink
	    url="http://xml.coverpages.org/">http://xml.coverpages.org/</ulink> 
	  and watch for announcements on the <link xreflabel="simple"
	    linkend="discussions">mailing lists and
	    newsgroups</link>.</para>
	<para>For a detailed guide to some examples of XML programs
	  and the concepts behind them, see the editor's book
	  <xref linkend="toolbook"/>.</para>
	<para>Details of some XML software products are held on the
	  <ulink url="http://xml.coverpages.org/sgml-xml.html">XML Web
	    pages</ulink>. For browsers see the question on <link
	    xreflabel="simple" linkend="browsers">XML Browsers</link>
	  and the details of the <link linkend="discussions"
	    xreflabel="simple">xml-dev mailing list</link> for
	  software developers. Bert Bos keeps <ulink
	    url="http://www.w3.org/XML/notes.html">a list of some XML
	    developments</ulink> in Bison, Flex, Perl, and Python. The
	  long-established conversion and application development
	  engines like Omnimark, and SGMLC all have XML capability and
	  they all provide APIs.</para>
	<tip id="editors">
	  <title>Editors</title>
	  <para>Choosing an editor is one of the hardest tasks,
	    because everyone has different requirements and levels of
	    knowledge, and what appears to be incredibly simple to one
	    user may seem dauntingly difficult to another. All XML
	    editors guide the user in the construction or maintenance
	    of XML documents&mdash;that's their purpose in
	    life.</para>
	  <para>The simplest ones just keep track of matching pointy
	    brackets, start-tags and end-tags, and balanced quotes,
	    leading to a <link linkend="wf"
	      xreflabel="simple">well-formed</link> file. More
	    powerful editors read a DTD or Schema and provide menu
	    choices for element manipulation and attribute editing,
	    and prevent the creation of invalid documents.</para>
	  <para>Some are text-mode
	    editors&mdash;they show all the markup and the text with
	    nothing hidden, often using colour to distinguish markup
	    characters. Some have a synchronous typographic mode as
	    well, using a stylesheet to format the information, so you
	    appear to be editing a typeset view of the document
	    (incorrectly called <acronym>WYSIWYG</acronym>). Text-mode
	    editors worry some users because the pointy brackets are
	    visible (they think it's programming); synchronous
	    typographic editors worry other people because the pointy
	    brackets are <emphasis>not</emphasis> visible, which makes
	    it hard to know where stuff begins and ends.</para>
	  <para>The more sophisticated editors are programmable, so
	    the nature and effect of the markup and the user's actions
	    can be limited or enhanced by scripts in JavaScript,
	    VBscript, Python, Tcl, Lisp, etc, even XSLT.</para>
	  <para>Do <emphasis>not</emphasis> be tempted to use a
	    non-XML editor like <productname>Notepad</productname>,
	    <productname>vi</productname>, or
	    <productname>textedit</productname> for XML documents: it
	    will only end in tears and recriminations. Get
	    properly-equipped.</para>
	  <para>There is a recent (2004) <ulink
	      url="http://ahds.ac.uk/creating/information-papers/xml-editors/">comparative 
	      paper on choosing an XML editor</ulink> from Thijs van
	    den Broek which may help, and an <ulink
	      url="http://www.freesoftwaremagazine.com/free_issues/issue_03/practical_applications_xml/">article</ulink> 
	    and <ulink url="http://www.xml-dev.com/blog/#19">set of
	      links</ulink> by Saqib Ali.</para>
	</tip>
	<para id="faq:XML-Chinese">Information for developers of
	  Chinese XML systems can be found at the Chinese XML Now!
	  website of Academia Sinica: <ulink id="FAQ:xml-chinese"
	    url="http://www.ascc.net/xml/">http://www.ascc.net/xml/</ulink> 
	  This site includes a FAQ and test files.</para>
      </answer>
    </qandaentry>
    <qandaentry id="docdata" remap="FAQ-API, api, dom">
      <question>
	<formalpara>
	  <title>What's my information? DATA or TEXT?</title>
	  <para>It depends on what you're using it for.</para>
	</formalpara>
      </question>
      <answer remap="nodes apis sax dom data text">
	<para>Some important distinctions exist between the major
	  classes of XML applications and the way in which they are
	  used:</para>
	<para>Two classes of applications are usually referred to as
	  <quote>document</quote> and <quote>data</quote>
	  applications, and this is reflected in the software, which
	  is usually (but not always) aimed at one class or the
	  other.</para>
	<variablelist>
	  <varlistentry>
	    <term>Document-style applications</term>
	    <listitem>
	      <para>These are in the nature of traditional publishers'
		work: text and images in a structured environment,
		with fonts and formatting; this includes Web pages as
		well as material destined for print like books and
		magazines.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Data-style applications</term>
	    <listitem>
	      <para>These are found mostly in e-commerce and process
		or application control, with XML being used as a
		container for information being stored or passed
		between systems, usually unformatted and unseen by
		humans.</para>
	    </listitem>
	  </varlistentry>
	</variablelist>
	<para>There is a third major area, Web Development, whose
	  requirements are often hybrid, and span the features of both
	  document and data applications because they contain partly
	  static descriptive text and partly dynamic data.</para> 
	<para>While in theory it would be possible to use data-class
	  software to write a novel, or document-class software to
	  create invoices, it would probably be severely suboptimal.
	  Because of the nature of the information used by the two
	  classes, data-class applications tend to use <link xreflabel="simple"
	    linkend="schemas">Schemas</link>, and document-class
	  applications tend to use <link linkend="dtds" xreflabel="simple">DTDs</link>,
	  but there is a considerable degree of overlap.</para> 
	<para>The way in which XML gets used in these two classes is
	  also divided in two: XML can be used manually or under
	  program control.</para>
	<variablelist>
	  <varlistentry>
	    <term>Manual usage</term>
	    <listitem>
	      <para>This means editing and maintaining the files with
		an editor, from the keyboard, seeing the information
		on the screen as you do so. This is suitable for
		individual documents, especially in the publishing
		field, and for developers working on single instances
		such as sample files or web site templates. Manual
		processing also implies running production programs
		like formatters, converters, and database queries on a
		one-by-one basis, using the keyboard and mouse in the
		normal way. Much of the software for manual usage can
		be run from the command line, which makes it easy to
		use for one-off applications and in hidden
		applications like Web scripts.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term>Programmable usage</term>
	    <listitem>
	      <para>This means writing programs which call on software
		services from APIs, libraries, or the network to
		handle XML files from inside the program. This is the
		normal method of operating for e-commerce
		applications, Web automation, and other process or
		application controls. There are libraries and APIs for
		many languages, including Java, C, and C++ as well as
		the usual scripting languages like Python, Perl, and
		Tcl.</para>
	    </listitem>
	  </varlistentry>
	</variablelist>
	<para>In addition to these axes, there are currently two
	  different ways of processing XML, memory-mapped or
	  event-triggered, usually referred to by the names of their
	  original instantiations, the <ulink
	    url="http://www.w3.org/TR/REC-DOM-Level-1"
	    id="dom">Document Object Model (DOM)</ulink> and the
	  <ulink url="http://www.saxproject.org/">Simple API for XML
	    (SAX)</ulink> respectively. Both use a model of document
	  engineering based on the tree-like structure of hierarchical
	  document markup known as a <ulink
	    url="http://xml.coverpages.org/topics.html#groves">grove</ulink> 
	  (a collection of trees, effectively an in-memory map of the
	  result of parsing the document markup), where every
	  <wordasword>node</wordasword> (item of information) from the
	  outermost element down through every element and attribute
	  to each piece of unmarked text can be identified. For
	  applications using Schemas, a Post-Schema-Validation Infoset
	  (PSVI) is defined, which specifies what information a parser
	  should make available to the application.</para>
	<tip xreflabel="Joe Fawcett">
	  <para>(in article
	    <literal><![CDATA[<eFIrHKtCGHA.2920@tk2msftngp13.phx.gbl>]]></literal>)</para>
	  <para>Briefly <wordasword>node</wordasword> is a generic
	    term for any of the many types of XML building blocks,
	    including <firstterm>element</firstterm>: <sgmltag
	      class="emptytag">myElement</sgmltag>;
	    <firstterm>attribute</firstterm>: <sgmltag
	      class="emptytag">myElement
	      myAttribute="myValue"</sgmltag>; and <firstterm>text
	      node</firstterm>: <sgmltag class="element"
	      name="myElement">my Text Node</sgmltag></para>
	  <para>There are also comments [<firstterm>Comment
	      Declarations</firstterm>], <firstterm>Processing
	      Instructions</firstterm> and the invisible
	    <firstterm>Document Node</firstterm> representing the
	    <firstterm>root</firstterm> of the XML document, as well
	    as others.</para>
	</tip>
	<para>Grossly oversimplified, a <firstterm>DOM-based
	    application</firstterm> reads an entire XML document into
	  memory and then provides programmable access to every node
	  in every tree in the grove; whereas a <firstterm>SAX-based
	    application</firstterm> reads the XML document, and events
	  are triggered by the occurrence of nodes as they happen, for
	  which rules or actions have been programmed. (In reality
	  it's more complex than that, and both methods share a lot of
	  concepts in common.)</para>
	<para>Both models provide an abstract API for
	  constructing, accessing, and manipulating XML documents. A
	  binding of the abstract API to a particular programming
	  language provides a concrete API. Vendors provide concrete
	  APIs which let you use one or other method to query and
	  manipulate XML documents. Both types of parser have been
	  implemented in many languages and under many operating
	  systems and interfaces.  There are FAQs for both <ulink
	    url="http://www.w3.org/DOM/faq.html"
	    id="FAQ:dom">DOM</ulink> and <ulink
	    url="http://www.saxproject.org/faq.html"
	    id="FAQ:sax">SAX</ulink>.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-SWCHX, mime" id="serversoftware">
      <question>
	<formalpara>
	  <title>Do I have to change any of my server software to work
	    with XML?</title>
	  <para>Make sure your server sends XML files as
	    <literal>text/xml</literal></para>
	</formalpara>
      </question>
      <answer remap="content-type media-type media content type">
	<para>The only changes needed are to make sure your server
	  serves up <filename>.xml</filename>,
	  <filename>.css</filename>, <filename>.dtd</filename>,
	  <filename>.xsl</filename>, and whatever other file types you
	  will use as the correct MIME content (media) types.</para>
	<para>The details of the settings are specified in <ulink
	    url="ftp://ftp.rfc-editor.org/in-notes/rfc3023.txt">RFC
	    3023</ulink>. Most new versions of Web server software
	  come preset.</para>
	<para>If not, all that is needed is to edit the
	  <filename>mime-types</filename> file (or its equivalent: as
	  a server operator you already know where to do this, right?)
	  and add or edit the relevant lines for the right media
	  types. In some servers (eg Apache), individual content
	  providers or directory owners may also be able to change the
	  MIME types for specific file types from within their own
	  directories by using directives in a
	  <filename>.htaccess</filename> file. The media types
	  required are:</para>
	<itemizedlist>
	  <listitem>
	    <para><literal>text/xml</literal> for XML documents which
	      are <quote>readable by casual users</quote>;</para>
	  </listitem>
	  <listitem>
	    <para><literal>application/xml</literal> for XML documents
	      which are <quote>unreadable by casual
		users</quote>;</para>
	  </listitem>
	  <listitem>
	    <para><literal>text/xml-external-parsed-entity</literal>
	      for external parsed entities such as document fragments
	      (eg separate chapters which make up a book) subject to
	      the readability distinction of
	      <literal>text/xml</literal>;</para>
	  </listitem>
	  <listitem>
	    <para><literal>application/xml-external-parsed-entity</literal> 
	      
	      for external parsed entities subject to the readability
	      distinction of
	      <literal>application/xml</literal>;</para>
	  </listitem>
	  <listitem>
	    <para><literal>application/xml-dtd</literal> for DTD files
	      and modules, including character entity sets.</para>
	  </listitem>
	</itemizedlist>
	<para>The RFC has further suggestions for the use of the
	  <literal>+xml</literal> media type suffix for identifying
	  ancillary files such as XSLT
	  (<literal>application/xslt+xml</literal>).
	</para>
	<para>If you run scripts generating XHTML which you wish to be
	  treated as XML rather than HTML, they may need to be
	  modified to produce the relevant Document Type Declaration
	  as well as the right media type if your application requires
	  them to be validated.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-SSINCLUDES, ssincludes" id="serverincludes">
      <question>
	<formalpara>
	  <title>Can I still use server-side inclusions?</title>
	  <para>Yes, just make sure the output conforms to XML</para>
	</formalpara>
      </question>
      <answer>
	<para>Yes, so long as what they generate ends up as part of an
	  XML-conformant file (ie either <link xreflabel="simple"
	    linkend="valid">valid</link> or just <link
	    xreflabel="simple" linkend="wf">well-formed</link>).</para>
	<para>Server-side tag-replacers like shtml, PHP, JSP, ASP,
	  Zope, etc store almost-valid files using comments,
	  Processing Instructions, or non-XML markup, which gets
	  replaced at the point of service by text or XML markup (it
	  is unclear why some of these systems use non-HTML/XML
	  markup). There are also some XML-based preprocessors for
	  formats like <ulink url="http://www.xvrl.org">XVRL</ulink>
	  (eXtensible Value Resolution Language) which resolve
	  specialised references to external data and output a
	  normalised XML file.</para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-CSINCLUDES, csincludes" id="clientincludes">
      <question>
	<formalpara>
	  <title>Can I (and my authors) still use client-side
	    inclusions?</title>
	  <para>Yes, just make sure the output conforms to XML</para>
	</formalpara>
      </question>
      <answer remap="vb5 vb6 visual basic">
	<para>The same rule applies as for <link xreflabel="simple"
	    linkend="serverincludes">server-side</link> inclusions, so you
	  need to ensure that any embedded code which gets passed to a
	  third-party engine (eg calls to SQL, VB, Java, etc) does not
	  contain any characters which might be misinterpreted as XML
	  markup (ie no angle brackets or ampersands). Either use a
	  CDATA marked section to avoid your XML application parsing
	  the embedded code, or use the standard <sgmltag
	    class="genentity">lt</sgmltag>, and <sgmltag
	    class="genentity">amp</sgmltag> character entity
	  references instead.<!-- Note that the STAGO (start-tag open)
	  character &lt; is invalid in CDATA in
	  XML.--></para>
      </answer>
    </qandaentry>
    <qandaentry remap="FAQ-TERMS, terms" id="terminology">
      <question>
	<formalpara>
	  <title>I'm trying to understand the XML Spec: why does it
	    have such difficult terminology?</title>
	  <para>It has to be formal to be accurate.</para>
	</formalpara>
      </question>
      <answer>
	<para>For implementation to succeed, the terminology needs to
	  be precise. Design goal eight of the specification tells us
	  that <quote>the design of XML shall be formal and
	    concise</quote>. To describe XML, the specification
	  therefore uses formal language drawn from several fields,
	  specifically those of text engineering, international
	  standards and computer science.  This is often confusing to
	  people who are unused to these disciplines because they use
	  well-known English words in a specialised sense which can be
	  very different from their common meanings&mdash;for example:
	  grammar, production, token, or terminal.</para>
	<para>The specification does not explain these terms because
	  of the other part of the design goal: the specification
	  should be concise. It doesn't repeat explanations that are
	  available elsewhere: it is assumed you know this and either
	  know the definitions or are capable of finding them. In
	  essence this means that to grok the fullness of the spec,
	  you do need a knowledge of some SGML and computer science,
	  and have some exposure to the language of formal
	  standards.</para>
	<para>Sloppy terminology in specifications causes
	  misunderstandings and makes it hard to implement
	  consistently, so formal standards have to be phrased in
	  formal terminology. This FAQ is not a formal document, and
	  the astute reader will already have noticed it refers to
	  <quote>element names</quote> where <quote>element type
	    names</quote> is more correct; but the former is more
	  widely understood.</para>
	<para>Those new to the terminology may find it useful to read
	  something like the <xref linkend="tei"/> or
	  <xref linkend="xmlann"/>.</para>
      </answer>
    </qandaentry>
    <qandaentry id="management">
      <question>
	<formalpara>
	  <title>I have to do an overview of XML for my
	    manager/client/investor/advisor. What should I
	    mention?</title>
	  <para>Non-proprietary multi-purpose flexible markup</para>
	</formalpara>
      </question>
      <answer>
	<tip xreflabel="Tad McClellan">
	  <itemizedlist>
	    <listitem>
	      <para>XML is <emphasis>not</emphasis> a markup language.
		XML is a <quote>metalanguage</quote>, that is, it's a
		language that lets you define <emphasis>your
		  own</emphasis> markup languages (see <link
		  xreflabel="simple"
		  linkend="whatishtml">definition</link>).</para>
	    </listitem>
	    <listitem>
	      <para>XML <emphasis>is</emphasis> a markup language [two
		(seemingly) contradictory statements one after another
		is an attention-getting device that I'm fond of],
		<emphasis>not</emphasis> a programming language. XML
		is data: is does not <quote>do</quote> anything, it
		has things done to it.</para>
	    </listitem>
	    <listitem>
	      <para>XML is non-proprietary: your data cannot be held
		hostage by someone else.</para>
	    </listitem>
	    <listitem>
	      <para id="multi">XML allows multi-purposing of your
		data.</para>
	    </listitem>
	    <listitem>
	      <para
	    id="sep">Well-designed XML applications most often
		separate <quote>content</quote> from
		<quote>presentation</quote>. You should describe what
		something <emphasis>is</emphasis> rather what
		something <emphasis>looks like</emphasis> (the
		exception being data content which never gets
		presented to humans).</para>
	    </listitem>
	  </itemizedlist>
	</tip>
	<para>Saying <quote>the data is in XML</quote> is a relatively
	  useless statement, similar to saying <quote>the book is in a
	    natural language</quote>. To be useful, the former needs
	  to specify <quote>we have used XML to define our own markup
	    language</quote> (and say what it is), similar to
	  specifying <quote>the book is in French</quote>.</para>
	<para>A classic example of <link xreflabel="simple"
	    linkend="multi">multipurposing</link> and <link
	    xreflabel="simple" linkend="sep">separation</link> that I
	  often use is a pharmaceutical company. They have a large
	  base of data on a particular drug that they need to publish
	  as:</para>
	<itemizedlist>
	  <listitem>
	    <para>reports to the FDA;</para>
	  </listitem>
	  <listitem>
	    <para>drug information for publishers of drug
	      directories/catalogs;</para>
	  </listitem>
	  <listitem>
	    <para><quote>prescribe me!</quote> brochures to send to
	      doctors;</para>
	  </listitem>
	  <listitem>
	    <para>little pieces of paper to tuck into the
	      boxes;</para>
	  </listitem>
	  <listitem>
	    <para>labels on the bottles;</para>
	  </listitem>
	  <listitem>
	    <para>two pages of fine print to follow their ad in
	      Reader's Digest;</para>
	  </listitem>
	  <listitem>
	    <para>instructions to the patient that the local
	      pharmacist prints out;</para>
	  </listitem>
	  <listitem>
	    <para>etc.</para>
	  </listitem>
	</itemizedlist>
	<para>Without separation of content and presentation, they
	  need to maintain essentially identical information in 20
	  places. If they miss a place, people die, lawyers get rich,
	  and the drug company gets poor. With XML (or SGML), they
	  maintain one set of carefully validated information, and
	  write 20 programs to extract and format it for each
	  application. The same 20 programs can now be applied to all
	  the hundreds of drugs that they sell.</para>
	<para>In the Web development area, the biggest thing that XML
	  offers is fixing what is wrong with HTML:</para>
	<itemizedlist>
	  <listitem>
	    <para>browsers allow non-compliant HTML to be
	      presented;</para>
	  </listitem>
	  <listitem>
	    <para>HTML is restricted to a single set of markup
	      (<quote>tagset</quote>).</para>
	  </listitem>
	</itemizedlist>
	<para>If you let broken HTML work (be presented), then there
	  is no motivation to fix it. Web pages are therefore tag soup
	  that are useless for further processing. XML specifies that
	  processing must not continue if the XML is non-compliant, so
	  you keep working at it until it complies. This is more work
	  up front, but the result is not a dead-end.</para>
	<para>If you wanted to mark up the names of things: people,
	  places, companies, etc in HTML, you don't have many choices
	  that allow you to distinguish among them. XML allows you to
	  name things as what they are:</para>
	<programlisting><![CDATA[
<person>Charles Goldfarb</person> worked 
at <company>IBM</company>
	  ]]></programlisting>
	<para>gives you a flexibility that you don't have with
	  HTML:</para>
	<programlisting><![CDATA[
<B>Charles Goldfarb</B> worked at <B>IBM</B>
	  ]]></programlisting>
	<para>With XML you don't have to shoe-horn your data into
	  markup that restricts your options.</para>
      </answer>
    </qandaentry>
    <qandaentry id="conformance" remap="test">
      <question>
	<formalpara>
	  <title>Is there a conformance test suite for XML
	    processors?</title>
	  <para>Yes, see <ulink url="http://www.oasis-open.org/committees/xmltest/testsuite.htm"></ulink></para>
	</formalpara>
      </question>
      <answer>
	<para><personname>
	    <firstname>James</firstname>
	    <surname>Clark</surname>
	  </personname> has a collection of test cases for testing XML
	  parsers at <ulink
	    url="http://www.jclark.com/xml/">http://www.jclark.com/xml/</ulink> 
	  which includes a conformance test against <quote>canonical
	    XML</quote>.</para>
	<tip xreflabel="Mary Brady" id="conftest">
	  <para>A much larger and more comprehensive suite is the
	    NIST/OASIS Conformance Test Suite, available from <ulink
	      url="http://www.oasis-open.org/committees/xmltest/testsuite.htm">http://www.oasis-open.org/committees/xmltest/testsuite.htm</ulink>, 
	    which contains contributions from <personname>
	      <firstname>James</firstname>
	      <surname>Clark</surname>
	  </personname>, OASIS and NIST, Sun, and Fuji Xerox.</para>
	</tip>
	<tip xreflabel="Carmelo Montanez">
	  <para>NIST has developed a number of XSLT/XPath tests, which
	    will be part of the official OASIS XSLT/XPath suite (not
	    yet released).  These tests are available from our web
	    site at <ulink
	      url="http://xw2k.sdct.itl.nist.gov/xml/index.html">http://xw2k.sdct.itl.nist.gov/xml/index.html</ulink> 
	    (click on <quote>XSL Testing</quote>). The expected output
	    may be slightly different from one implementation to
	    another.  The OASIS XSLT technical committee has a
	    solution for that problem, however our tests do not yet
	    implement such solution. Please forward any comments to
	    <ulink url="carmelo@nist.gov"></ulink>.</para>
	</tip>
	<tip xreflabel="Jon Noring">
	  <para>For those who are interested, I took the current and
	    complete Unicode 3.0 <quote>cast</quote> of characters and
	    their hex codes, and created a simple XML document of it
	    to test XML browsers for Unicode conformity. It is not
	    finished yet&mdash;I need to add comments and to fix the
	    display of rtl characters (ie Hebrew, Arabic). It is found
	    at: <ulink
	      url="http://www.windspun.com/unicode-test/unicode.xml">http://www.windspun.com/unicode-test/unicode.xml</ulink>. 
	    It is quite large, almost 900K in size, so be prepared.
	    IE5 renders many of the characters in this XML
	    document&mdash;and for the ones it does render it appears
	    to do so correctly.  I look forward to when Opera will do
	    likewise.  I haven't tested the current version of
	    Mozilla/Netscape for Unicode conformity.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="dtdconv" remap="dtdconv">
      <question>
	<formalpara>
	  <title>I've already got SGML DTDs: how do I convert them for
	    use with XML?</title>
	  <para>Edit by hand or use software like Near+Far
	    Designer.</para>
	</formalpara>
      </question>
      <answer remap="internalsubset">
	<para>There are numerous projects to convert common or popular
	  SGML DTDs to XML format (for example, both the <ulink
	    url="http://www.tei-c.org/">TEI DTD</ulink> (Lite and full
	  versions) and the <ulink
	    url="http://www.docbook.org/">DocBook DTD</ulink> are
	  available in both SGML and XML, in Schema and DTD
	  formats).</para>
	<tip xreflabel="Seán McGrath">
	  <title>To convert SGML DTDs to XML:</title>
	  <orderedlist>
	    <listitem>
	      <para>No equivalent of the SGML Declaration. So
		keywords, character set etc are essentially
		fixed;</para>
	    </listitem>
	    <listitem>
	      <para>Tag minimisation is not allowed, so
	    <programlisting><![CDATA[<!ELEMENT x - O
		  (A,B)>]]></programlisting> becomes
		<programlisting><![CDATA[<!ELEMENT X
		  (A,B)>]]></programlisting> and
		<programlisting><![CDATA[<!ELEMENT
	      x - O EMPTY>]]></programlisting> becomes
	    <programlisting><![CDATA[<!ELEMENT X
		  EMPTY>]]></programlisting>;</para>
	    </listitem>
	    <listitem>
	      <para
	    id="mixedcont"><sgmltag>#PCDATA</sgmltag> must only occur
		at the extreme left (ie first) in an OR model, eg
		<programlisting><![CDATA[<!ELEMENT x - -
	      (A|B|#PCDATA|C)>]]></programlisting> (in SGML) becomes
	    <programlisting><![CDATA[<!ELEMENT x
	      (#PCDATA|A|B|C)*>]]></programlisting>, and
	    <programlisting><![CDATA[<!ELEMENT x
		  (A,#PCDATA)>]]></programlisting> is illegal;</para>
	    </listitem>
	    <listitem>
	      <para>No CDATA, RCDATA elements [declared
		content];</para>
	    </listitem>
	    <listitem>
	      <para>Some SGML attribute types are not allowed in XML
		eg NUTOKEN;</para>
	    </listitem>
	    <listitem>
	      <para>Some SGML attribute defaults are not allowed in
		XML eg CONREF;</para>
	    </listitem>
	    <listitem>
	      <para>Comments cannot be inline to declarations like 
<programlisting><![CDATA[<!ELEMENT x - - (A,B) -- an SGML comment in a
		  declaration 
		  -->]]></programlisting>;</para>
	    </listitem>
	    <listitem>
	      <para>A whole bunch of SGML optional features are not
		present in XML: all forms of tag minimisation
		(OMITTAG, DATATAG, SHORTREF, etc); Link Process
		Definitions; Multiple DTDs per document; and many
		more: see <ulink
		  url="http://www.w3.org/TR/NOTE-sgml-xml-971215"
		  id="howto"></ulink> for the list of bits of SGML
		that were removed for XML;</para>
	    </listitem>
	    <listitem>
	      <para>And [nearly] last but not least, no CONCUR!</para>
	    </listitem>
	    <listitem>
	      <para>There are some important differences between the
		internal and external subset portion of a DTD in XML:
		Marked Sections can only occur in the external subset;
		and Parameter Entities must be used to replace entire
		declarations in the internal subset portion of a DTD,
		eg the following is invalid XML:</para> 
	      <programlisting><![CDATA[ 
<!DOCTYPE x [ 
<!ENTITY % modelx "(A|B)*"> 
<!ELEMENT x %modelx;> 
]> 
<x></x>
		]]></programlisting>
	    </listitem>
	  </orderedlist>
	  <para>For more information, see <xref
	      linkend="xmlexample"/>.</para>
	</tip>
      </answer>
    </qandaentry>
    <qandaentry id="dtdincludes" remap="includes">
      <question>
	<formalpara>
	  <title>How do I include one DTD (or fragment) in
	    another?</title>
	  <para>Use a parameter entity, same as for SGML</para>
	</formalpara>
      </question>
      <answer>
	<para>This works exactly the same as for SGML. First you
	  declare the entity you want to include, and then you
	  reference it by name as a parameter entity:</para>
	<programlisting><![CDATA[ 
<!ENTITY % mylists SYSTEM "dtds/listfrag.ent"> 
... 
%mylists; 
	  ]]></programlisting>
	<para>Such declarations traditionally go all together towards
	  the top of the main DTD file, where they can be managed and
	  maintained, but this is not essential so long as they are
	  declared before they are used. You use Parameter Entity
	  Syntax for this (the percent sign) because the file is to be
	  included at DTD compile time, not when the document instance
	  itself is parsed.</para>
	<para>Note that a URI is compulsory in XML as the System
	  Identifier for all external file references: standard rules
	  for dereferencing URIs apply (assume the same method,
	  server, and directory as the containing document). A Formal
	  Public Identifier can also be used, following the same rules
	  as <link linkend="fpis" xreflabel="simple">elsewhere</link>.</para>
      </answer>
    </qandaentry>
    <qandaentry id="conditionals" >
      <question>
	<formalpara>
	  <title>How can I include a conditional statement in my XML?</title>
	  <para>You can't: XML isn't a programming language. But you
	  can have conditional criteria in a Schema, DTD, or a processor.</para>
	</formalpara>
      </question>
      <answer remap="conditionals">
	<para>You can't: <link linkend="execute" xreflabel="simple">XML isn't a
	    programming language</link>, so you can't say things
	  like</para>
	<programlisting conformance="no"><![CDATA[
<foo if {DB}="A">bar</foo>
	  ]]></programlisting>
	<para>If you need to make an element optional, based on some
	  internal or external criteria, you can do so in a Schema.
	  DTDs have no internal referential mechanism, so it isn't
	  possible to express this kind of conditionality in a DTD at
	  the individual element level.</para>
	<para>It <emphasis>is</emphasis>
	  possible to express presence-or-absence conditionality in a
	  DTD for the whole document, by using parameter entities as
	  boolean switches to include or ignore certain sections of
	  the DTD based on settings either hardwired in the DTD or
	  supplied in the internal subset. Both the TEI and Docbook
	  DTDs use this mechanism to implement modularity.</para>
	<para>Alternatively you can make the element entirely optional
	  in the DTD or Schema, and provide code in your processing
	  software that checks for its presence or absence. This
	  defers the checking until the processing stage: one of the
	  reasons for Schemas is to provide this kind of checking at
	  the time of document creation or editing.</para>
      </answer>
    </qandaentry>
    <qandaentry id="edi" remap="edi">
      <question>
	<formalpara>
	  <title>What's the story on XML and EDI?</title>
	  <para>Getting there: still needs more work and
	    agreement.</para>
	</formalpara>
      </question>
      <answer>
	<para>Electronic Data Interchange has been used in e-commerce
	  for many years to exchange documents between commercial
	  partners to a transaction. It requires special proprietary
	  software and is prohibitively expensive to implement for
	  small and medium-sized enterprises. There are moves to
	  enable EDI documents to travel inside XML, as well as
	  proposals to replace the existing EDI formats with XML ones.
	  There are guideline documents at  <ulink
	    url="http://www.eccnet.com/xmledi/guidelines-styled.xml"></ulink> 
	  and <ulink
	    url="http://www.geocities.com/WallStreet/Floor/5815/guide.htm"></ulink>.</para>
	<para>Probably the biggest effect on EDI is the rise of
	  standardisation attempts for XML business documents and
	  transactions. The standard jointly sponsored by OASIS and United
	  Nations/CEFACT is <ulink
	    url="http://www.ebxml.org/">ebXML</ulink> (Electronic
	  Business XML) which provides Schemas for the common
	  commercial transaction document types. Normal office
	  documents (letters, reports, spreadsheets, etc) are already
	  being done using the materials under the charge of the OASIS
	  Open Office XML Formats TC, detailed <link xreflabel="simple"
	    linkend="officeapps">above</link>. Other standards such as
	  <ulink url="http://www.openapplications.org">OAGI</ulink> and <ulink
	    url="http://www.rosettanet.org">RosettaNet</ulink> are undergoing
	  interoperability testing with ebXML.</para>
	<para>In addition to full standards, there are many sets of
	  shims, interoperability tools, and component libraries such
	  XML Common Business Library (<ulink
	    url="http://www.xcbl.org/">xCBL</ulink>).</para>
      </answer>
    </qandaentry>
  </qandadiv>
  <qandadiv id="appendix" remap="FAQ-FORM, app">
    <title>Appendices</title>
    <qandaentry id="bibliography">
      <question>
	<formalpara>
	  <title>References</title>
	  <para>There is a much larger XML and SGML bibliography at
	    <ulink url="http://xml.coverpages.org/biblio.html"></ulink>.</para>
	</formalpara>
      </question>
      <answer>
	<para>This list covers only documents directly referenced in
	this FAQ.</para>
	<bibliodiv>
	  <biblioentry id="toolbook" role="book">
	    <author>
	      <firstname>Peter</firstname>
	      <surname>Flynn</surname>
	    </author>
	    <title>Understanding SGML and XML Tools</title>
	    <publisher>
	      <publishername>Kluwer</publishername>
	      <address>Boston, MA</address>
	    </publisher>
	    <date>1998</date>
	    <isbn>0-7923-8169-6</isbn>
	    <releaseinfo>http://www.amazon.com/exec/obidos/tg/detail/-/0792381696/qid=1128202814/sr=1-1/ref=sr_1_1/102-0476289-3244914?v=glance&amp;s=books</releaseinfo>
	  </biblioentry>
	  <biblioentry id="devdtd" role="book">
	    <authorgroup>
	      <author>
		<firstname>Eve</firstname>
		<surname>Maler</surname>
	      </author>
	      <author>
		<firstname>Jeanne</firstname>
		<surname remap="preserve">el Andaloussi</surname>
	      </author>
	    </authorgroup>
	    <title>Developing SGML DTDs</title>
	    <subtitle>From Text to Model to Markup</subtitle>
	    <publisher>
	      <publishername>Prentice Hall PTR</publishername>
	      <address>Upper Saddle River, NJ</address>
	    </publisher>
	    <date>1995</date>
	    <isbn>0133098818</isbn>
	    <releaseinfo>http://www.amazon.com/exec/obidos/tg/detail/-/0133098818/qid=1104447963/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/002-9386245-9385639?v=glance&amp;s=books&amp;n=507846</releaseinfo>
	  </biblioentry>
	  <biblioentry id="esl" role="book">
	    <author>
	      <firstname>Lynne</firstname>
	      <surname>Truss</surname>
	    </author>
	    <title>Eats, Shoots &ampers; Leaves</title>
	    <subtitle>The Zero-Tolerance Approach to
	      Punctuation</subtitle>
	    <publisher>
	      <publishername>Profile Books</publishername>
	      <address>London</address>
	    </publisher>
	    <date>2003</date>
	    <isbn>1-86197-612-7</isbn>
	    <releaseinfo>http://www.amazon.com/exec/obidos/tg/detail/-/1592400876/qid=1104449308/sr=8-1/ref=pd_csp_1/002-9386245-9385639?v=glance&amp;s=books&amp;n=507846</releaseinfo>
	  </biblioentry>
	  <biblioentry id="docdb" role="inproceedings">
	    <articleinfo>
	      <authorgroup>
		<author>
		  <firstname>Airi</firstname>
		  <surname>Salminen</surname>
		</author>
		<author>
		  <firstname>Frank</firstname>
		  <surname>Tompa</surname>
		</author>
	      </authorgroup>
	      <title>Requirements for XML Document Database
		Systems</title>
	      <releaseinfo>http://db.uwaterloo.ca/~fwtompa/.papers/xmldb-desiderata.pdf</releaseinfo>
	    </articleinfo>
	    <confgroup>
	      <conftitle>ACM Symposium on Document
		Engineering</conftitle>
	      <address>Atlanta, GA</address>
	      <confdates>November 2001</confdates>
	    </confgroup>
	  </biblioentry>
	  <biblioentry id="xmlann" role="book">
	    <author>
	      <firstname>Bob</firstname>
	      <surname>DuCharme</surname>
	    </author>
	    <title>XML: The Annotated Specification</title>
	    <publisher>
	      <publishername>Prentice Hall PTR</publishername>
	      <address>Upper Saddle River, NJ</address>
	    </publisher>
	    <date>1999</date>
	    <isbn>0-13-082676-6</isbn>
	    <releaseinfo>http://www.snee.com/bob/xmlann</releaseinfo>
	  </biblioentry>
	  <biblioentry id="xmlexample" role="book">
	    <author>
	      <firstname>Seán</firstname>
	      <surname>McGrath</surname>
	    </author>
	    <title>XML by Example</title>
	    <subtitle>Building E-Commerce Applications</subtitle>
	    <publisher>
	      <publishername>Prentice Hall PTR</publishername>
	      <address>Upper Saddle River, NJ</address>
	    </publisher>
	    <date>1998</date>
	    <isbn>0139601627</isbn>
	    <releaseinfo>http://www.amazon.com/exec/obidos/tg/detail/-/0139601627/qid=1104449400/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/002-9386245-9385639?v=glance&amp;s=books&amp;n=507846</releaseinfo>
	  </biblioentry>
	  <biblioentry id="nopres" role="inproceedings">
	    <articleinfo>
	      <author>
		<firstname>Peter</firstname>
		<surname>Flynn</surname>
	      </author>
	      <title>Making more use of markup</title>
	      <artpagenums>158&ndash;167</artpagenums>
	    <releaseinfo>http://imbolc.ucc.ie/~pflynn/articles/moreuse.html</releaseinfo>
	    </articleinfo>
	    <confgroup>
	      <conftitle>SGML'95</conftitle>
	      <address>Boston, MA</address>
	      <confdates>December 1995</confdates>
	    </confgroup>
	  </biblioentry>
	  <biblioentry id="richsgml" role="inproceedings">
	    <articleinfo>
	      <author>
		<firstname>Chet</firstname>
		<surname>Ensign</surname>
	      </author>
	      <title>If SGML Is So Smart, How Come It Ain't Rich?</title>
	      <artpagenums>136&ndash;145</artpagenums>
	    </articleinfo>
	    <confgroup>
	      <conftitle>SGML'95</conftitle>
	      <address>Boston, MA</address>
	      <confdates>December 1995</confdates>
	    </confgroup>
	  </biblioentry>
	  <biblioentry role="book" id="fox">
	    <author>
	      <firstname>Dave</firstname>
	      <surname>Pawson</surname>
	    </author>
	    <title>XSL-FO</title>
	    <subtitle>Making XML Look Good in Print</subtitle>
	    <publisher>
	      <publishername>O'Reilly</publishername>
	      <address>Sebastopol, CA</address>
	    </publisher>
	    <date>2002</date>
	    <isbn>0-596-00355-2</isbn>
	    <releaseinfo>http://www.oreilly.com/catalog/xslfo/</releaseinfo>
	  </biblioentry>
	  <biblioentry id="tei" role="inbook">
	    <articleinfo>
	      <authorgroup>
		<editor>
		  <firstname>Michael</firstname>
		  <surname>Sperberg-McQueen</surname>
		</editor>
		<editor>
		  <firstname>Lou</firstname>
		  <surname>Burnard</surname>
		</editor>
	      </authorgroup>
	      <title>Gentle Introduction to XML</title>
	      <releaseinfo>http://www.tei-c.org/Guidelines2/gentleintro.pdf</releaseinfo>
	      <artpagenums></artpagenums>
	    </articleinfo>
	    <title>TEI P4: Guidelines for Electronic Text Encoding and
	      Interchange</title>
	    <publisher>
	      <publishername>Text Encoding Initiative
		Consortium</publishername>
	      <address>Oxford, Providence, Charlottesville,
		Bergen</address>
	    </publisher>
	    <date>2002</date>
	  </biblioentry>
	  <biblioentry id="thespec" role="techreport">
	    <authorgroup>
	      <editor>
		<firstname>Tim</firstname>
		<surname>Bray</surname>
	      </editor>
	      <editor>
		<firstname>Jean</firstname>
		<surname>Paoli</surname>
	      </editor>
	      <editor>
		<firstname>CM</firstname>
		<surname>Sperberg-McQueen</surname>
	      </editor>
	      <editor>
		<firstname>Eve</firstname>
		<surname>Maler</surname>
	      </editor>
	      <editor>
		<firstname>François</firstname>
		<surname>Yergeau</surname>
	      </editor>
	    </authorgroup>
	    <title>Extensible Markup Language (XML) 1.0</title>
	    <releaseinfo>http://www.w3.org/TR/REC-xml/</releaseinfo>
	    <publisher>
	      <publishername>W3C</publishername>
	      <address>Boston</address>
	    </publisher>
	    <edition>3rd</edition>
	    <date>4 February 2004</date>
	  </biblioentry>
	</bibliodiv>
      </answer>
    </qandaentry>
    <qandaentry id="future" >
      <question>
	<formalpara>
	  <title>How far are we going?</title>
	  <para>To infinity and beyond!</para>
	</formalpara>
      </question>
      <answer remap="sex pornography pornographic pictures anal">
	<para>Running a search facility on this FAQ has produced some
	  interesting results from the notifications of both matches
	  and non-matches. <ulink
	    url="http://dylan.tweney.com/prophet/981019prophet.htm">Sex</ulink> 
	  has dropped to 10th place.</para>
	<itemizedlist>
	  <listitem>
	    <para>The most frequent request (5&percnt; overall) is now
	      individual characters, either as character entity
	      names or as numeric values, or one of the markup
	      characters (<literal>&lt;</literal> or
	      <literal>&amp;</literal>).</para>
	  </listitem>
	  <listitem>
	    <para>In recent months the second largest category has
	      stabilised as the word <literal>dtd</literal> (3&percnt;).</para>
	  </listitem>
	  <listitem>
	    <para>Third comes CDATA at 2&percnt; (hardly surprising
	    given the abuse so widespread).</para>
	  </listitem>
	  <listitem>
	    <para>Fourth equal at 1&percnt; come XSD and XSL, neither
	    of which is dealt with in detail here as they have their
	    own FAQs.</para>
	  </listitem>
	</itemizedlist>
	<para id="lite">The entertaining bits are deep in the tail,
	  like the user from Broomfield, CO, who typed in <quote>How
	    can I analyze a telephone to understand it better?</quote>
	  (taking it to pieces is probably a start); the one from the
	  Phillipines who wanted to know how to <quote>describe the
	    five fundamental interactions between X-rays or Gamma rays
	    with matter</quote> (try DS9); the one from Culver City,
	  CA, who asked <quote>how are echinodermata organisms
	    different from lower invertebrates?</quote> (like I
	  care?); and the one from Lexington, KY, who asked <quote>How
	    do I add two text fields?</quote> (got me there, d00d, how
	  do you multiply a lettuce and a cucumber?).</para>
      </answer>
      <answer>
	<programlisting><![CDATA[
Date: Fri, 09 Jul 1999 14:26:17 -0500 (EST)
From: The Internet Oracle <oracle@cs.indiana.edu>
Subject: The Oracle replies!
To: <address-removed>
X-Planation: X-Face can be viewed with ftp.cs.indiana.edu:/pub/faces.

The Internet Oracle has pondered your question deeply.
Your question was:

> Oh Oracle most wise, all-seeing and all-knowing,
> in thy wisdom grant me a response to my request:
> 
> Is XML really going to cut the mustard?

And in response, thus spake the Oracle:

} Well, since XML is a subset of SGML, and SGML 
} has a <cut mustard> tag, I'd have to say yes.
} 
} You owe the Oracle a B1FF parser.
	  ]]></programlisting>
	<para>For the SGML-curious among our readers, that's:</para>
	<programlisting><![CDATA[
<!element cut - o empty>
<!attlist cut mustard (mustard) #required>
<!-- :-) -->
	  ]]></programlisting>
      </answer>
    </qandaentry>
    <qandaentry id="glossary" >
      <question>
	<formalpara>
	  <title>Not the XML FAQ</title>
	  <para>Infrequently Asked Questions</para>
	</formalpara>
      </question>
      <answer remap="infrequently">
	<para>This is a list of topics that people have asked about or
	  searched for in relation to the XML FAQ, which are not
	  necessarily directly connected to XML and its technology,
	  nor <emphasis>frequently</emphasis> asked questions. It also
	  includes some fall-back definitions for the benefit of users
	  who have come to XML by different routes and may not have
	  been exposed to ay document publishing background.</para>
	<para>Readers may also want to look at <personname>
	    <firstname>Joe</firstname>
	    <surname>English</surname>
	  </personname>'s <quote>Not the SGML FAQ</quote> at <ulink
	    url="http://www.flightlab.com/~joe/sgml/faq-not.txt"></ulink>.</para>
	<glosslist>
	  <glossentry id="xls">
	    <glossterm remap="xls export convert">XLS</glossterm>
	    <glossdef>
	      <para>Microsoft proprietary spreadsheet file format
		written by their <productname>Excel</productname>
		spreadsheet program. XLS files are not XML files, but
		versions of <productname>Excel</productname> in
		<productname>Office-11</productname> and above can
		also save their data in Microsoft's own Office-XML
		format: use the
		<guimenu>File</guimenu><guimenuitem>Save
		  As&hellip;</guimenuitem> menu item. Future versions
		of <productname>Excel</productname> may be able to use
		external Schemas.</para>
	      <para>Do not confuse XLS with XSL (see <link
		  linkend="style"></link>).</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="xml">
	    <glossterm remap="faq">XML</glossterm>
	    <glossdef>
	      <para>This is the XML FAQ. Everything in it is about
		XML. For introductory explanations, see <link
		  linkend="basics"></link>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="color">
	    <glossterm remap="colors colours">Colour</glossterm>
	    <glossdef>
	      <para>XML is designed for identifying information about
		the structure and content of text documents, rather
		than their appearance. Although it is perfectly
		possible to identify and store information about
		appearances, this information is usually kept in a CSS
		or XSL stylesheet. If you need to record information
		about the formatting or appearance of an existing
		document, there are features in the <ulink
		  url="http://www.tei-c.org/">TEI</ulink> Schema/DTD
		for doing so.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="editing">
	    <glossterm remap="opening docs">Editing</glossterm>
	    <glossdef>
	      <para>To edit (open) an XML file you should use an <link
		  xreflabel="simple" linkend="editors">XML
		  editor</link>. It is possible to open an XML file
		using any standard plaintext editor or even a
		wordprocessor, but be aware that they may try to
		reformat the file incorrectly because they don't
		understand XML.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="games">
	    <glossterm remap="nintendo">Games</glossterm>
	    <glossdef>
	      <para>I am not aware of any computer games written using
		XML yet, although XML may well be used in some of the
		internal control and configuration files used by
		games.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="soap">
	    <glossterm remap="simple object access
	      protocol">SOAP</glossterm>
	    <glossdef>
	      <para>A <ulink url="http://www.w3.org/TR/soap/">W3C
		  standard</ulink> for the <quote>definition of the
		  XML-based information which can be used for
		  exchanging structured and typed information between
		  peers in a decentralized, distributed
		  environment</quote>. Most commonly used in Web
		  Services for message-passing.</para>
	      <para>Originally the <ulink
		  url="http://xml.coverpages.org/soap.html">Simple
		  Object Access Protocol</ulink>, the acronym is now
		undefined, or expressed as the Service-Oriented Access
		Protocol.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="serving">
	    <glossterm remap="text/xml">Serving XML</glossterm>
	    <glossdef>
	      <para>See <link linkend="serversoftware"></link></para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="newlines">
	    <glossterm remap="newlines linebreaks line breaks end crlf
	      lfcr lf cr line feed linefeed carriage returns cr-lf">Line
	      breaks</glossterm>
	    <glossdef>
	      <para>XML files can be created using any of the three
		standard newline representations: CR (Mac), LF (Unix),
		or CR/LF (Windows). Use of anything else may lead to
		undefined behaviour (so old DOS editors that use LF/CR
		may create unusable files).</para>
	      <para>Line-breaking in your output is governed by your
		rendering engine (eg a browser, a typesetter, etc).
		Your DTD or Schema may define special elements or
		entities to be used on rare occasions when a forced
		linebreak is required, but this is not normally
		something done in XML (exception: reconstruction of
		historical documents using the TEI).</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="protocol">
	    <glossterm>XML Protocol</glossterm>
	    <glossdef>
	      <para>There is a Working Group for Web Services at the
		W3C, and part of their remit is to work on an XML
		Protocol. See <ulink
		  url="http://www.w3.org/2000/xp/Group/"></ulink> for
		details.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="javascript">
	    <glossterm>Javascript</glossterm>
	    <glossdef>
	      <para>ECMAscript (to give it its real name) has nothing
		to do with the Java language. It's designed to run
		inside browser windows, navigating or acting on the
		markup of a page to create dynamic content, validate
		forms, or instantiate objects in ways that are not
		possible with static HTML. It is also designed so that
		it cannot write to the user's local filesystem, for
		obvious security reasons, so it cannot easily be used
		to create XML files locally, although there are some
		back-doors in Microsoft software which allow modified
		pages to be saved to disk.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="tmx">
	    <glossterm remap="oscar">TMX</glossterm>
	    <glossdef>
	      <para><ulink
		  url="http://www.lisa.org/tmx/tmx.htm">TMX</ulink> is
		a standard method to describe translation memory data
		that is being exchanged among tools and/or translation
		vendors for human-language translation (part of the
		OSCAR project from LISA).</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="xul">
	    <glossterm remap="interface">XUL</glossterm>
	    <glossdef>
	      <para>The <ulink
		  url="http://www.mozilla.org/projects/xul/">XML User
		  Interface Language</ulink>, designed for specifying
		the user interface in the Mozilla browser.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="xmlhttp">
	    <glossterm remap="ajax">XMLHTTP</glossterm>
	    <glossdef>
	      <para>Feature implemented in MSXML and elsewhere to
		allow the retrieval of web pages, binary data, or
		scripted responses under program control (like using
		<ulink
		  url="http://www.gnu.org/software/wget/wget.html">wget</ulink> 
		or <ulink
		  url="http://jl.photodex.com/dog/">dog</ulink> in a
		shell script). Used asynchronously in <link xreflabel="simple"
		  linkend="ajax">AJaX</link> applications to pre-fetch
		data, saving time to make it appear that an
		application is operating locally.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="white-space">
	    <glossterm remap="whitespace white spaces tabs xml:space">White-space</glossterm>
	    <glossdef>
	      <para>See <link linkend="whitespace"></link>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="searching" >
	    <glossterm remap="extracting">Searching</glossterm>
	    <glossdef>
	      <para>You can search individual XML files on a
		sequential, stand-alone, unindexed command-line basis
		using programs such as
		<productname>sggrep</productname>, part of the <ulink
		  url="http://www.ltg.ed.ac.uk/software/xml/">LTXML</ulink> 
		library.</para>
	      <para>XSLT allows a limited search facility simply by
		using functions like <literal>contains</literal>,
		<literal>starts-with</literal>, and
		<literal>ends-with</literal>. XSLT2 adds Regular
		Expressions, but this is not yet (2005) a
		Recommendation. XQuery is a fully-fledged
		search language for XML.</para>
	      <para>The <productname>Saxon</productname> XSLT
		processor comes with an implementation of <ulink
		  url="http://www.w3.org/XML/Query">XQuery</ulink>
		(see also the <ulink 
		  url="http://www.ibiblio.org/xql/">XQL FAQ</ulink>),
		which can accept queries either from the command line
		or from a file. Saxon can also use a control file to
		specify groups of XML files to be searched
		together.</para>
	      <para>For indexed searching (for speed) you need an
		XQuery search tool that implements an indexing engine
		which reads and understands markup. These are usually
		implemented as part of a
		<wordasword>native</wordasword> XML database system
		such as eXist (and many others), which run
		either stand-alone or in parallel with an XML server
		like Cocoon.</para>
	      <para>Traditional relational databases (MySQL, Oracle,
		etc) tend to store XML as undistinguished strings or BLOBs, using
		bolt-on XML-like backends to disambiguate the markup.
		<wordasword>Native</wordasword> XML databases can be
		configured for granularity, to store at a specific
		element level, making markup-sensitive searching much
		easier.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="asp">
	    <glossterm remap="asp dot net framework language .net">asp.net</glossterm>
	    <glossdef>
	      <para>ASP (Active Server Pages) is a Microsoft language
		for serving dynamic web pages, similar in concept to
		JSP, PHP, and others. In itself, ASP has nothing
		inherently to do with XML, although like any
		server-side system, it can be used for serving XML
		just as well as an other type of file.</para>
	      <para>.NET itself is an application platform and
		methodology for web services development on Microsoft
		servers. Most web services are predicated on XML as
		the <wordasword>common carrier</wordasword> of
		inter-business messaging, so .NET has a significant
		XML component.</para>
	      <para>
		<tip xreflabel="Marc Hadley">
		  <para>There are many alternatives to ASP, most of
		    which use a similar  page based approach. Java based
		    alternatives include <ulink
		      url="http://java.sun.com/products/jsp/">Java
		      Server Pages</ulink> (JSP), <ulink
		      url="http://java.sun.com/j2ee/javaserverfaces/">Java 
		      Server Faces</ulink> (JSF) and <ulink
		      url="http://cocoon.apache.org/">Cocoon</ulink>
		    (which includes <ulink
		      url="http://cocoon.apache.org/2.1/userdocs/xsp/logicsheet.html">eXtensible 
		      Server Pages</ulink>&mdash;XSP). Popular scripting
		    language  alternatives include <ulink
		      url="http://www.axkit.org/">AxKit</ulink> (Perl,
		    also supporting XSP), <ulink
		      url="http://www.zope.org/">Zope</ulink> (Python)
		    and <ulink
		      url="http://www.rubyonrails.org/">Rails</ulink>
		    (Ruby) [all of which have extensive XML
		    support.&mdash;Ed.]</para>
		</tip>
	      </para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="disadvantages">
	    <glossterm>Disadvantages</glossterm>
	    <glossdef>
	      <para>XML markup has a few disadvantages:</para>
	      <itemizedlist>
		<listitem>
		  <para>It can be verbose unless element and attribute
		    names are chosen with care. In large documents the
		    markup overhead need not be large, but in short
		    messages it can be significantly more than the
		    actual data, especially when the element or
		    attribute names are concocted by machine.</para>
		</listitem>
		<listitem>
		  <para>Overlapping markup is not permitted (an
		    element cannot start inside one element and end
		    inside another): element markup must nest
		    hierarchically.</para>
		</listitem>
		<listitem>
		  <para>Some of the software is truly mediocre.</para>
		</listitem>
	      </itemizedlist>
	    </glossdef>
	  </glossentry>
	  <glossentry id="rendering">
	    <glossterm>Rendering</glossterm>
	    <glossdef>
	      <para>Using XSLT or XSL:FO transformation (or other
		similar conversion systems), information marked up in
		XML can be rendered to almost any target: HTML, PDF,
		audio, Braille, and almost any plain-text format (eg
		<LaTeX/>). How it appears (or
		sounds) is the result of using stylesheets or other
		transformation logic activated by the markup.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="fp">
	    <glossterm remap="floating point numbers
	      integers">Floating-point</glossterm>
	    <glossdef>
	      <para>You cannot declare character data content or
		attribute values as floating-point using DTDs. To do
		that you need to use a Schema.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="counting">
	    <glossterm remap="counting">Enumeration</glossterm>
	    <glossdef>
	      <para>To count the number of occurrences of a node in an
		XML document, you can use the
		<function>count</function> function in XSL[T],
		eg</para>
	      <programlisting><![CDATA[
<xsl:value-of select="count(//chapter)"/>
		]]></programlisting>
	      <para>To apply a counter to a repetitive element type,
		use the <function>xsl:number</function> element,
		eg</para>
	      <programlisting><![CDATA[
<xsl:number select="appendix" level="any" format="A"/>
		]]></programlisting>
	      <para>For more on XSLT, see <link linkend="style"></link>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="xll">
	    <glossterm remap="hyperlinks xmllink linking anchors">XLL</glossterm>
	    <glossdef>
	      <para>The XML Linking Language comprises the XLink
		specification and the XPointer specification. For
		details, see the <ulink
		  url="http://www.w3.org/XML/Linking.html">XML Linking
		  Working Group</ulink> at the W3C.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="specialchars">
	    <glossterm remap="&lt; &amp; % ! &gt; &quot; aquot > less
	    than greater ampersand percent exclamation mark sign
	      symbol tilde acute grave circumflex umlaut
	      diaeresis">Special characters</glossterm> 
	    <glossdef>
	      <para>XML has only two special markup characters in
		normal documents:</para>
	      <itemizedlist>
		<listitem>
		  <para>The open angle bracket or less-than sign
		    (<literal><![CDATA[<]]></literal>) which begins a
		    start-tag or end-tag like
		    <literal><![CDATA[<report>]]></literal> or
		    <literal><![CDATA[</table>]]></literal>;</para>
		</listitem>
		<listitem>
		  <para>The ampersand character
		    (<literal><![CDATA[&]]></literal>) which starts an
		    <firstterm>entity reference</firstterm> like
		    <literal><![CDATA[&aacute;]]></literal> for á or
		    <literal><![CDATA[&#x00A7;]]></literal> for
		    &sect;.</para>
		</listitem>
	      </itemizedlist>
	      <para>Contrary to popular opinion, the closing angle
		bracket or greater-than (<literal>></literal>) and the
		semicolon (<literal>;</literal>) are not special
		characters in normal text: they only acquire their
		temporary special meaning once one of the two markup
		characters has been encountered.</para>
	      <para>In DTDs, the percent sign (<literal>%</literal>)
		has a special meaning in <firstterm>entity
		  declarations</firstterm>: it defines the entity as a
		<firstterm>parameter entity</firstterm>, meaning that
		it can only be used inside the DTD, not in a document
		text, and only for data substitution (a kind of simple
		macro).</para>
	      <para>The exclamation mark (<literal>!</literal>)
		acquires a special meaning immediately after a
		less-than sign: when followed by one of the
		declaration keywords in a DTD it signals the start of
		Declaration; when followed by two dashes it signals
		the start of a comment (ended by another two dashes
		and a greater-than sign.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="loops">
	    <glossterm remap="repetition">Loops</glossterm>
	    <glossdef>
	      <para>To process some XML repetitively, you need to use
		a processing language which allows looping or the
		cyclical handling of a defined set of nodes. For
		example in XSLT, to output all chapter titles to make
		a table of contents (ie out of natural document
		position), you could say:</para>
	      <programlisting><![CDATA[
<xsl:for-each select="//chapter">
  <li>
    <xsl:value-of select="title"/>
  </li>
</xsl:for-each>
		]]></programlisting>
	    </glossdef>
	  </glossentry>
	  <glossentry id="uml">
	    <glossterm>UML</glossterm>
	    <glossdef>
	      <para>The <ulink url="http://www.uml.org/">Unified
		  Modeling Language</ulink> has nothing to do with
		XML, although there are many points of contact, and
		<ulink
		  url="http://xml.coverpages.org/ni2001-10-10-a.html">some 
		  software is available</ulink> to express some UML
		structures in XML for the purposes of inter-process
		messaging.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="media">
	    <glossterm remap="include play avi mpg wmv audio
	      video">Multimedia</glossterm>
	    <glossdef>
	      <para>The <ulink
		  url="http://www.w3.org/AudioVideo/">Synchronized
		  Multimedia Integration Language</ulink> (SMIL)
		provides an XML vocabulary for simple authoring of
		interactive audiovisual presentations. SMIL is
		typically used for <wordasword>rich
		  media</wordasword>/multimedia presentations which
		integrate streaming audio and video with images, text
		or any other media type.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="wellformed">
	    <glossterm remap="wellformed">Well-formed</glossterm>
	    <glossdef>
	      <para>See <link linkend="wf"></link>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="sml">
	    <glossterm>SML</glossterm>
	    <glossdef>
	      <para>The <ulink url="">Spacecraft Markup
		  Language</ulink> is an application of XML.</para>
	      <para>The <ulink
		  url="http://www.smlnj.org/sml97.html">Standard
		  ML</ulink> programming language is not.</para>
	      <para>Did you mean <link xreflabel="simple"
		  linkend="whatissgml">SGML</link>?</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="sorting">
	    <glossterm>Sorting</glossterm>
	    <glossdef>
	      <para>To sort a repetitive set of XML elements in
		XSL[T], use the <function>xsl:sort</function> element,
		eg</para>
	      <programlisting><![CDATA[
<xsl:for-each select="//acronym">
  <xsl:sort select="@abbrev"/>
  <xsl:value-of select="@abbrev"/>
  <xsl:text>: </xsl:text>
  <xsl:apply-templates/>
</xsl:for-each>
		]]></programlisting>
	    </glossdef>
	  </glossentry>
	  <glossentry id="wap">
	    <glossterm>WAP</glossterm>
	    <glossdef>
	      <para>The Wireless Application Protocol (WAP) is now
		handled by the <ulink
		  url="http://www.openmobilealliance.org/tech/affiliates/wap/wapindex.html">Open 
		  Mobile Alliance</ulink>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="gtt">
	    <glossterm>GTT</glossterm>
	    <glossdef>
	      <para>The Gnome Time Tracker is a component of the Gnome
		interface used extensively on Linux systems. Part of
		its internal data is configured in XML.</para>
	      <para></para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="bpel">
	    <glossterm>BPEL</glossterm>
	    <glossdef>
	      <para>The <ulink
		  url="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsbpel">Business 
		  Process Execution Language</ulink> is an XML-based
		specification of the steps required for a cooperative
		business process to take place between consenting
		servers.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="idempotent">
	    <glossterm remap="idempotent">Idempotency</glossterm>
	    <glossdef>
	      <para>A term used in <ulink
		  url="http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html">the 
		  HTTP specification</ulink> to describe the
		side-effect-free nature of repeated requests for a
		resource.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="rss">
	    <glossterm remap="news reader news feed
	      newsfeed">RSS</glossterm>
	    <glossdef>
	      <para>The <ulink
		  url="http://en.wikipedia.org/wiki/RSS_(protocol)">Really 
		  Simple Syndication</ulink> format was designed to
		allow news sites to process updates by machine, and it
		evolved into a semi-standard format for blogs and
		other frequently-changing sites to notify the world of
		changes. Unfortunately it was never properly defined,
		and has multiple incompatible and undocumented
		versions. It was about to be superseded by a vastly
		better language called Atom, but Microsoft have
		recently announced their support for RSS, so it looks
		like we may be stuck with a lemon for years to
		come.</para>
	      <para><wordasword>Newsreaders</wordasword> (RSS readers)
		are available for all platforms, both standalone and
		as browser plugins. Do not confuse these with programs
		of the same description designed to provide access to
		the Usenet News service, which is a different thing
		entirely (and which you will need to read <ulink
		  type="news"
		  url="comp.text.xml">comp.text.xml</ulink>).</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="variables">
	    <glossterm>Variables</glossterm>
	    <glossdef>
	      <para>XML doesn't have variables or parameters, nor does
		it have fields or records. These are all terms from
		programming and database technology, and do not have
		exact equivalents in XML.</para>
	      <para>XML identifies your information with
		<firstterm>elements</firstterm> and
		<firstterm>attributes</firstterm>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="envvar">
	    <glossterm>Environment variables</glossterm>
	    <glossdef>
	      <para>XML is a markup language, not a programming
	      language, so it has no concept of environment
	      variables. However, if you are using a DTD, and
	      accessing your XML files under program control (eg in a
	      script rather than by hand) it is
	      possible to modify the value of declared attributes or entities (eg
	      with a stream-editor like sed) before the file is
	      opened, and thereby to pass values from the external
	      environment into the document. A similar approach would
	      be possible with Schemas.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="entities">
	    <glossterm remap="entitiese semicolon semi colon
	      accents diacriticals">Entities</glossterm>
	    <glossdef>
	      <para>An <firstterm>entity</firstterm> is a unit of
		storage in XML. It can be as small as a character or
		as large as a while document. Four types of entity
		are <firstterm>declarable</firstterm>:</para>
	      <variablelist>
		<varlistentry>
		  <term>General entities</term>
		  <listitem>
		    <para>which can be like string-replacement
		      macros:</para>
		    <programlisting><![CDATA[
<!ENTITY IBM "International Business Machines">
		      ]]></programlisting>
		    <para>These can be used for shorthand data entry
		      or to guarantee uniform spelling like
		      <literal><![CDATA[&IBM;]]></literal> and they
		      get replaced when the file is parsed.</para>
		    <para>They can also represent external
		      files:</para>
		    <programlisting><![CDATA[
<!ENTITY chap5 SYSTEM "chapter5.xml">
		      ]]></programlisting>
		    <para>which can be used as a file-inclusion
		      mechanism at the point where you insert
		      <literal><![CDATA[&chap5;]]></literal>. External
		      general file entities must not contain the XML
		      Declaration or any Document Type
		      Declaration.</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>Document entities</term>
		  <listitem>
		    <para>These are like external general file
		    entities except that they specify the type of data
		    they contain, using a declared Notation, so that
		    the parser and application can decide how to
		    handle them (eg include them or hand them to another
		    program specific to their type of medium):</para>
		    <programlisting><![CDATA[
<!ELEMENT link (#PCDATA)>
<!ATTLIST link to ENTITY #REQUIRED>
...
<!NOTATION PDF PUBLIC 
 "-//Adobe//NOTATION Portable Document Format//EN//PDF" 
 "http://partners.adobe.com/public/developer/pdf/index_reference.html">
<!ENTITY pricelist SYSTEM "/sales/pricelist.pdf" 
 NOTATION PDF>
...
<para>Please refer to our <link to="pricelist">current price list</link>.</para>
		    ]]></programlisting>
		    <para>This provides an extremely robust method of
		    defining an external entity once and allowing it
		    to be referenced multiple times (if the external
		    filename changes, you only have to update the
		    entity declaration).</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>Character entities</term>
		  <listitem>
		    <para>like <literal><![CDATA[&aacute;]]></literal>
		      to represent characters that users without the
		      required keyboard features may want to enter
		      like <wordasword>á</wordasword>;</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>Parameter Entities</term>
		  <listitem>
		    <para>are like General Entities but can only be
		      referenced within a DTD. They are used for
		      control of content models, inclusion or
		      exclusion of declarations, and modification of
		      modular constructs:</para>
		    <programlisting><![CDATA[
<!ENTITY % local.qandaset.mix "|bibliodiv">
		      ]]></programlisting>
		    <para>(to use an example from the DTD for this
		      FAQ) where the mix of element types in the
		      content model for <sgmltag>qandaset</sgmltag> is
		      specified by the entities
		      <literal>qandaset.mix</literal> (defined by
		      DocBook) <emphasis>and</emphasis> by
		      <literal>local.qandaset.mix</literal> (definable
		      by the user [me]) so that the DTD can be tweaked
		      without having to be edited.</para>
		  </listitem>
		</varlistentry>
	      </variablelist>
	      <para>General entity names, including XML document
		entities and character entities, always start with an
		ampersand (<literal><![CDATA[&]]></literal>) and end
		with a semicolon (<literal>;</literal>), and can be
		used anywhere in your document. Parameter entities can
		only be used in a DTD: they start with a percent sign
		(<literal>%</literal>) and end with a
		semicolon.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="ajax">
	    <glossterm>AJaX</glossterm>
	    <glossdef>
	      <para>Asynchronous HTTP, Javascript, and XML. A technique for
	      improving the interactivity of web pages whereby
	      in-browser scripting detects user activity and
	      pre-fetches the required data asynchronously from an
	      XML-backed data-store, instead of waiting until the user
	      clicks on a link and requesting it synchronously from
	      the server.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="pipelines">
	    <glossterm>Pipelining</glossterm>
	    <glossdef>
	      <para>Technique for reducing complex sequential and
		parallel processing requirements to a set of
		components which can be completed under program
		control. The term is taken from the Unix facility for
		redirecting the output of one command into the input
		of another (called a <wordasword>pipe</wordasword>),
		in effect creating a chain or pipeline through which
		data passes on its way from source to result.</para>
	      <para>The W3C has a <ulink
		  url="http://www.w3.org/TR/2002/NOTE-xml-pipeline-20020228/">Note</ulink> 
		pending submission on an <citetitle>XML Pipeline
		  Definition Language</citetitle> which could be used
		to define a pipeline in a portable, vendor-independent
		manner.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="attribs">
	    <glossterm remap="xml:id namechar xml:lang">Attributes</glossterm>
	    <glossdef>
	      <para>These are items of <firstterm>metadata</firstterm>
		or <firstterm>metainformation</firstterm> (information
		about information) which can be added to the start-tag
		of an element. Usually attributes are a way of
		refining the meaning, function, or some other quality
		of an element. They take the form of a name and a
		quoted value joined by an equals sign, eg</para>
	      <programlisting><![CDATA[
<part id="B22" catnum="51N1573R" level="App">Left-handed Screwdriver</part>
		]]></programlisting>
	      <para>Attribute names must follow the XML rules for
		Names (see the <link linkend="spec"
		  xreflabel="simple">spec</link>).  If your
		application does not use a DTD or Schema, the
		attribute values are treated as plain text (CDATA) and
		cannot have any special meaning to XML (with the
		exception of <sgmltag
		  class="attribute">xml:id</sgmltag> and <sgmltag
		  class="attribute">xml:lang</sgmltag>, see below). In
		a DTD or Schema, attributes can be assigned datatypes,
		the most common being (using DTD terminology for
		simplicity):</para>
	      <variablelist>
		<varlistentry id="ididref">
		  <term>ID or IDREF</term>
		  <listitem>
		    <para>ID attribute values
		      must be XML Names (no spaces; must begin with a
		      letter) and they must be unique in a document.
		      An IDREF attribute value can occur any number of
		      times, but it must be the value of an ID
		      attribute in the same document. ID and IDREF are
		      most frequently used for cross-referencing
		      within documents.</para>
		    <para>Note that an ID
		      attribute can have any name: it doesn't have to
		      be
		      <emphasis>called</emphasis>&nbsp;<wordasword>ID</wordasword>, 
		      although it frequently is. Conversely&mdash;as a
		      matter of best practice&mdash;you should never
		      use the name <wordasword>ID</wordasword>
		      (<wordasword>id</wordasword>) for an attribute
		      which is not of type ID, simply because it's
		      confusing. If your application has unique
		      identity values that the community calls IDs,
		      and which are <emphasis>not</emphasis> XML
		      Names, either name the attribute something
		      different (eg
		      <wordasword>Product-ID</wordasword>) or document
		      <emphasis>heavily</emphasis> that the value is
		      not an XML ID.</para>
		    <para>There is a <ulink
			url="http://www.w3.org/TR/xml-id/">W3C
			Recommendation</ulink> that document type
		      designers should use the <emphasis>attribute
			name</emphasis>&nbsp;<sgmltag
			class="attribute">xml:id</sgmltag>, and this
		      can be interpreted by parsers as being a unique
		      ID without the need for the document to use a
		      DTD or Schema.</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>CDATA</term>
		  <listitem>
		    <para>Just text.</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>Token List</term>
		  <listitem>
		    <para>The attribute must have one of a restricted
		      number of values (specified in parentheses in
		      the declaration, separated by vertical bars),
		      eg</para>
		    <programlisting><![CDATA[
<!ATTLIST part level (App|Jny|Mst) #REQUIRED>
<!ATTLIST Q.27 resp (Yes|No) "Yes">
		      ]]></programlisting>
		    <para>In the first example there is no default,
		      and a value is compulsory. In the second,
		      <wordasword>Yes</wordasword> is the default
		      value (if the attribute is omitted, the parser
		      will take the default value from the
		      declaration).</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>ENTITY</term>
		  <listitem>
		    <para>The attribute value must be a declared <link xreflabel="simple"
			linkend="entities">Entity</link>.</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>NMTOKEN</term>
		  <listitem>
		    <para>An XML Name Token is like an ID value (no
		      spaces) but it <emphasis>can</emphasis> begin
		      with a non-letter (eg a digit or
		      punctuation).</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>Special attributes</term>
		  <listitem>
		    <para>In addition to <sgmltag
			class="attribute">xml:id</sgmltag> (mentioned
		      above), there are two others allowed by the XML
		      Specification:</para>
		    <variablelist>
		      <varlistentry>
			<term>xml:space</term>
			<listitem>
			  <para>to signal an intention that in that
			    element, white space should be preserved
			    by applications;</para>
			</listitem>
		      </varlistentry>
		      <varlistentry>
			<term>xml:lang</term>
			<listitem>
			  <para>to specify the language used in the
			    contents and attribute values of any
			    element.</para>
			</listitem>
		      </varlistentry>
		    </variablelist>
		    <para>See sections 2.10 and 2.12 of the Spec for
		    more detail.</para>
		  </listitem>
		</varlistentry>
	      </variablelist>
	      <para>In Schemas a much greater range of datatypes is
		available than in DTDs, and complex validation
		criteria can be attached to each.</para>
	      <para>Attributes in a DTD can be declared as <sgmltag
		  class="declparam">REQUIRED</sgmltag> (compulsory),
		<sgmltag class="declparam">IMPLIED</sgmltag>
		(optional), or <sgmltag
		  class="declparam">FIXED</sgmltag> (predefined and
		invariable).</para>
	      <para>There is not intended to be any limit on the
		length of an attribute value, but you should check
		that your processing software can handle unusual data
		volumes if you intend to use very large
		lengths.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="uriparse">
	    <glossterm remap="semicolon">URI parsing errors</glossterm>
	    <glossdef>
	      <para>See <link linkend="semicolon"></link>.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="tables">
	    <glossterm>Tables</glossterm>
	    <glossdef>
	      <para>You can define tables any way you wish in XML (see
		<link linkend="makeup"></link>) but there are a few
		existing table models which have become so widely-used
		(and supported by software) that it would need a very
		compelling reason to invent something new. There are
		more details in <xref
		  linkend="toolbook"/> &sect;2.3.7.</para>
	      <variablelist>
		<varlistentry>
		  <term>HTML</term>
		  <listitem>
		    <para>HTML tables were invented by Mosaic (now
		      Netscape) and first appeared in the HTML2 DTD.
		      In all versions of HTML and XHTML they define a
		      very simple but practical model, with very few
		      refinements, suitable for web use and for
		      rudimentary printing. Their chief advantage is
		      that in a browser the cell heights and widths
		      (and thus the column widths) expand or contract
		      automatically to accommodate the amount of text
		      contained in them. Most other table models
		      assume the widths of the columns and the height
		      of the cells will be specified in advance (which
		      you can do in HTML but this is rarely
		      used).</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>CALS</term>
		  <listitem>
		    <para>Computer-Aided Logistics and Support (and
		      several other acronyms over the years) was (is)
		      part of the US military project to ensure a
		      consistent markup for all documentation,
		      originally in SGML, now in XML. As part of this
		      activity the CALS table model has become the
		      most widely-used in technical documentation,
		      especially for Interactive Electronic Technical
		      Manuals (IETMs), with extensive support in all
		      the major editors, and it is the default table
		      model in the DocBook DTD and Schema. The CALS
		      definitions are very powerful but quite complex,
		      and can handle virtually all requirements for
		      spanning, ruling, and aligning.</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>SASOUT</term>
		  <listitem>
		    <para>This model has been used extensively in the
		      social sciences and elsewhere for defining
		      tables based on the semantics of the data,
		      rather than the appearance. At one time they
		      were an alternative in DocBook (enabled by a
		      simple parameter entity switch).</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term>TEI</term>
		  <listitem>
		    <para>The TEI model is designed to allow the
		    encoder to represent existing tables being
		    transcribed from historical, literary, or archive
		    material, rather than for the generation of new
		    data. The markup is at the same level of
		    simplicity as the HTML model, but it is designed
		    to allow the inclusion of the much denser markup
		    and metadata needed in research texts.</para>
		  </listitem>
		</varlistentry>
		<varlistentry>
		  <term><LaTeX/></term>
		  <listitem>
		    <para>The <LaTeX/> model is not of direct concern
		      to the XML user except insofar as <LaTeX/> is a
		      common target for transformations from XML using
		      XSLT. Like CALS, <LaTeX/> tables can handle
		      almost any formatting, but the default
		      alignments assume that each column format is
		      defined beforehand, and that each cell will
		      occupy one line of data: an additional package
		      (<application>array</application>) is needed to
		      handle multi-line cells in the way the HTML
		      model does.</para>
		  </listitem>
		</varlistentry>
	      </variablelist>
	      <para>In XML, it is not necessary to use tables
		to mark up lists as is often done in wordprocessors,
		because the processing facilities of languages
		like XSLT allow you to transform the document to use
		non-tabular methods (like HTML's
		<sgmltag>div</sgmltag>s). Table markup should
		therefore be confined to <wordasword>real</wordasword>
		tables (data arranged in rows and columns) and not
		abused simply because you want something displayed on
		a level with something else: it is better to pick
		markup which is designed to do the job properly rather
		than to distort existing facilities.</para>
	      <para>Wordprocessor users are usually unaware that many
		structures that they currently use wordprocessor
		tables for are in fact segmented lists, which
		wordprocessors are incapable of handling correctly.
		One of the major reasons for doing it properly is that
		the data can then be reprocessed to make sense when
		read in the natural order.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="bom">
	    <glossterm>Byte Order Mark</glossterm>
	    <glossdef>
	      <para>A two-byte signature (<literal>0xFEFF</literal>,
		defined in Unicode and ISO 10646) which must be
		prepended to the XML document when using the the UCS-2
		encoding, in order to allow processors to
		differentiate between the UCS-2 and UTF-8
		encodings.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="ip" >
	    <glossterm remap="who owns xml copyright trademark
	      symbol">Patents, Copyright, and Intellectual
	      Property</glossterm>
	    <glossdef>
	      <para>I'm not a lawyer, and this is not legal advice. If
		you're worried, see a psychiatrist first ©.</para>
	      <para>Since the USA (and, increasingly, elsewhere)
		stopped sanity-checking patent applications, pretty
		much anyone can patent anything in these countries,
		regardless of whether or not it already exists. If you
		are sufficiently intellectually bankrupt, you can then
		start sending invoices to companies and even
		individuals demanding payment of license fees for
		continued use.</para>
	      <para>XML was drafted during 1995 and first published in
		1996, so anyone claiming they invented pointy-bracket
		self-defining hierarchically-nested structured markup
		after that is probably a few elements short of a DTD.
		XML is based on SGML, which is an international
		standard codified as ISO 8879:1986, and it was
		preceded by numerous other closely-related markup
		systems, so anyone claiming they invented it after
		that date is equally wide of the markup.</para>
	      <para>Lots of subsequent derivative technologies which
		owe their existence to the SGML and XML groundwork
		quite possibly <emphasis>are</emphasis> valid patents,
		in the same way that fire was not originally patented
		but matches and lighters were.</para>
	      <para>Patents were originally designed for new physical
		inventions. Their use for methodologies and algorithms
		extended the concept into the realm of ideas, which
		many people regard as deeply suspect. The patenting of
		natural phenomena like genes (which are pre-existing
		parts of Nature like politicians or pond scum), is
		meaningless and intellectually void, although legally
		enforceable in the USA and elsewhere.</para>
	      <para>Copyright subsists automatically in anything you
		create, but in some countries (notably the USA and
		France) you cannot enforce this unless you register
		your interest. Copyright persists for a number of
		years after your death (EU: 75, different elsewhere)
		in order to let your descendants benefit from sales of
		your work.</para>
	      <para>Copyright is for the physical form of intellectual
		expression like books, newspapers, works of art, web
		sites, or computer programs. It exists to prevent
		others stealing your work and selling it.  You can
		quote snippets of other people's work without
		permission, such as a line of a poem, or a bar of
		music, or a sentence from a novel, provided you say
		whose it is and where to find it: otherwise you need
		to ask permission beforehand. Copyright already
		provides more than adequate protection for computer
		programs, making the use of patents for them
		unnecessary overkill.</para>
	      <para>Intellectual Property identifies you as the owner
		of the thoughts and ideas which may find their
		physical manifestation in patentable inventions or
		copyrightable publications. Even if you sell off your
		patents, and for long after your copyrights have
		expired, you can still be seen as the person who
		dreamed up the idea, and some countries (eg the UK)
		allow you formally to assert your right to be so
		identified, regardless of what happens to the book or
		the gizzmo.</para>
	      <para>You should <emphasis>always</emphasis> acknowledge
		the intellectual property of others, especially when
		you use it in furtherance of your own aims. Pretending
		that someone else's smart ideas are your own is
		probably a worse offence than trying to patent fire,
		water, the wheel, or XML.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="escape">
	    <glossterm remap="escape characters sequences">Escaping</glossterm>
	    <glossdef>
	      <para>Escaping means temporarily switching the way a
		program works to do something different with the data.
		In SGML, it was conventional to use only ASCII
		characters in your documents because keyboards,
		screens, and fonts for other characters were often
		unavailable. To escape from the limitations of this
		format for non-ASCII characters like accents and
		symbols a set of mnemonic names was available,
		prefixed by an ampersand (&amp;) to turn the
		escapement on, and followed by a semicolon (;) to turn
		the it off, so an á was given as <sgmltag
		  class="genentity">aacute</sgmltag>.</para>
	      <para>XML allows you to use Unicode, so any character or
		symbol in any language can be entered as itself. If
		you are using UTF-8 encoding in your documents, there
		is no need to use escaping except for the two markup
		symbols (&lt; and &amp;). However, not everyone has a
		Unicode editor, and complete Unicode fonts are very
		large, so it is conventional in alphabetic languages
		to pick an encoding which allows you to use the
		majority of the characters you need, and to use
		escaping for the occasional other characters.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry>
	    <glossterm>XML security standards</glossterm>
	    <glossdef>
	      <para>Eve</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="csv">
	    <glossterm remap="csv export convert">Data export</glossterm>
	    <glossdef>
	      <para>A common requirement in the flat data model used
		in many e-commerce systems is to export XML data to
		the CSV (Comma-Separated Values) data format used as
		input to spreadsheets. There is a simple example of a
		short script to do this <ulink
		  url="http://silmaril.ie/downloads/software/xml2csv.zip">here</ulink>. 
		More complex and sophisticated routines could easily
		be written using XSLT or other XML processing
		software. Users should note that while conversion to
		CSV is adequate for simple data formats, it is an
		inappropriate format for normal XML text documents
		which use Mixed Content models.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="imp">
	    <glossterm remap="import load convert conversion">Data
	      import</glossterm>
	    <glossdef>
	      <para>Many XML projects require the import of existing
		documents in non-XML formats. The import of existing
		HTML documents is explained in <xref
	      linkend="conversion"/>, and if you can convert your
		documents to XHTML this is probably the simplest
		method. OpenOffice saves Open Document Format (ODF)
		files, which are the international standard for office
		XML documents. Word files can be saved as WordML
		(2003) or Office Open XML (2007: Microsoft's
		alternative to ODF). In both cases an XSLT
		transformation can be written to create a suitable XML
		import format. For complex documents in other formats,
		however, specialist conversion software is needed.
		Some XML editors are beginning to offer inbuilt
		conversion of other formats, and there are many
		standalone conversion systems available (some at high
		cost) for formats which are otherwise not easily
		machine-accessible via markup, like PDF, PostScript,
		<TeX/>, Quark XPress, and most proprietary document
		formats. The critical point is that almost all non-XML
		(non-SGML) formats are formatted to make them
		human-readable and even pretty, not to make them
		machine-readable. It is therefore often the case that
		the information required to make the document
		meaningful in XML simply doesn't exist in these
		formats. The only alternative for this class of
		documents is to have them rekeyed or scanned into XML
		by one of the many companies in the Indian
		subcontinent or the Pacific Rim.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry id="htmlfunc">
	    <glossterm remap="checkboxes radiobuttons textareas">Text
	    document formatting functions</glossterm> 
	    <glossdef>
	      <para>Because XML is a metalanguage to let you define
		and name your <emphasis>own</emphasis> information
		structures, it has no built-in knowledge of anything
		to start with. It therefore has no inherent
		understanding of any document specifics like bulleted
		lists, sections, footnotes, or any of the common
		online features like drop-down menus, forms (inputs,
		check boxes, radio buttons, and text areas), scripts,
		mouseovers, or other bells and whistles&mdash;these
		are things which <emphasis>you</emphasis> have to use
		XML to define, in a DTD or Schema for your specific
		application. Contrary to the impression given by some
		manufacturers these things are
		<emphasis>not</emphasis> built into XML itself. You
		first choose or design a document type (Schema or DTD)
		to represent your information accurately, then you can
		generate effects like the above by using CSS styling,
		or writing an XSL[T] transformation of your XML to
		HTML, Word, <LaTeX/>, PDF, or whatever other format is
		capable of instantiating them.</para>
	      <para>There <emphasis>are</emphasis> additional native-XML
		proposals and recommendations at the W3C for XML Forms
		handling, XML Linking, XML Security, and a lot of
		other features, but these are architectural enabling
		mechanisms, not drop-in replacements for HTML.</para>
	    </glossdef>
	  </glossentry>
	  <glossentry>
	    <glossterm></glossterm>
	    <glossdef>
	      <para></para>
	    </glossdef>
	  </glossentry>
	  <glossentry>
	    <glossterm></glossterm>
	    <glossdef>
	      <para></para>
	    </glossdef>
	  </glossentry>
	  <glossentry>
	    <glossterm></glossterm>
	    <glossdef>
	      <para></para>
	    </glossdef>
	  </glossentry>
	  <glossentry>
	    <glossterm></glossterm>
	    <glossdef>
	      <para></para>
	    </glossdef>
	  </glossentry>
	</glosslist>
      </answer>
    </qandaentry>
    <qandaentry id="oldsoft" revisionflag="added">
      <question>
	<formalpara>
	  <title>Lost XML software</title>
	  <para>Some of the best software that has disappeared</para>
	</formalpara>
      </question>
      <answer remap="lost old good software obsolete former early">
	<para>The most common cause of lost good software seems to be
	that the company making it got taken over through no fault of
	their own, by a corporate shark who didn't know what they were
	buying, or who simply didn't care. In these cases it wasn't
	the product that was at fault&mdash;often it was popular and
	selling well; it just fell foul of corporate stupidity.</para>
	<variablelist>
	  <varlistentry>
	    <term><productname>Near&amp;Far</productname>
	      (MicroStar)</term>
	    <listitem>
	      <para>A standalone visual (graphical) SGML DTD design
		tool, originally for Microsoft <productname>Windows
		  95</productname>. N&amp;F made it very easy to
		prototype a new document type, although later stages
		of development were usually hand-tuned. It was also an
		excellent tool for displaying the structure of a
		newly-encountered DTD. When XML arrived, they
		kept the internal SGML model but provided a
		<wordasword>save-as</wordasword> in XML syntax.</para>
	      <para>Many current design tools have similar embedded
		functionality (eg <productname>XML Spy</productname>),
		but there is no equivalent standalone tool of the same
		quality. A development to use
		<productname>RelaxNG</productname> to generate
		different syntaxes would be a major advance.</para>
	      <para>MicroStar was bought by OpenText Corp and the
		product was dropped on the floor just at the point
		when it would have been most useful. If you have a
		copy (one was embedded in the WordPerfect SGML/XML
		editor), it still executes under XP, and in Codeweavers'
		<productname>Wine</productname> under Linux.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry>
	    <term><productname>DynaWeb</productname> (EBT)</term>
	    <listitem>
	      <para>A family of products:
		<productname>DynaBase</productname>, the underlying
		SGML database; <productname>DynaWeb</productname>, a
		Windows server with a graphically-managed stylesheet
		system for serving XML or SGML converted to HTML, and
		an excellent markup search facility; and
		<productname>DynaTag</productname>, a GUI system for
		converting <productname>Word</productname> and
		<productname>Frame</productname> documents to SGML or
		XML, based on the original
		<productname>RainbowMaker</productname> commandline
		converter.</para>
	      <para>EBT was bought up by Inso Corp, and the product
		was ignored for several years. However, a page on
		Indo's server now claims to provide details, but it is
		not known if the product is still available. It
		appears that they inherited some users, so for a while
		they still had a <productname>DynaWeb</productname>
		training page.</para>
	      <para>The good news is that Red Bridge Software now
		occupies the old EBT factory (under the Red Bridge in
		Providence, RI), selling a content management system
		that includes <productname>DynaTag</productname> and
		some other elements of the original range.</para>
	    </listitem>
	  </varlistentry>
	  <varlistentry id="panorama">
	    <term><productname>Panorama</productname>
	      (SoftQuad)</term>
	    <listitem>
	      <para>An SGML browser from <ulink
		  url="http://www.users.cloud9.net/~bradmcc/panorama-1.html">SoftQuad</ulink> 
		with an SGML-syntax stylesheet which worked both
		standalone and as a Netscape plugin, based on Synex
		<productname>Viewport</productname>. This let users
		open direct links to SGML documents:
		<productname>Panorama</productname> would download
		both instance and DTD via an entity resolver, perform
		a tokenised parse, and apply the specified
		stylesheet.</para>
	      <para>Its unique features included switching between
		multiple stylesheets, a search result density
		indicator, and the ability to implement double-ended
		HyTime links, which let anyone publish their own set
		of links, even multi-ended links, and even between
		documents that they didn't own. The browser plugin was
		free, and the full version included the stylesheet
		editor.</para>
	      <para>SoftQuad faltered after Yuri Rubinsky passed away,
		and was taken over by Corel
		(<productname>WordPerfect</productname>), where the product was
		ignored.</para>
	      <note>
		<para>SoftQuad's
		  <productname>Author/Editor</productname> SGML editor
		  product transmuted into
		  <productname>XMeTaL</productname>, which is still
		  available from <ulink
		  url="http://na.justsystems.com/">JustSystems</ulink>.</para>
	      </note>
	    </listitem>
	  </varlistentry>
	</variablelist>
	<para>If you have more information about useful products that
	have disappeared, please email the editor.</para>
      </answer>
    </qandaentry>
  </qandadiv>
</qandaset>
