Related
s
DOM
HTML
HTML5
MSXML
NAMESPACES
NOT SGML
SAX
SCHEMA
SGML
SVG
TEX
UNICODE
XML CHINESE
XML CONDENSED
XML DUTCH
XSL
E.3 Not the XML FAQ
This is a list of topics that people have asked about or searched for in relation to the XML FAQ, which are not necessarily directly connected to XML and its technology, nor frequently asked questions. It also includes some fall-back definitions for the benefit of users who have come to XML by different routes and may not have been exposed to ay document publishing background.
Readers may also want to look at Joe English's ‘Not the SGML FAQ’ at http://www.flightlab.com/~joe/sgml/faq-not.txt.
AJaX
Attributes<part id="B22" catnum="51N1573R" level="App">Left-handed Screwdriver</part>
Attribute names must follow the XML rules for Names
(see the spec). If your application does
not use a DTD or Schema, the attribute values are treated
as plain text (CDATA) and cannot have any special meaning
to XML (with the exception of xml:id and xml:lang, see below). In a DTD
or Schema, attributes can be assigned datatypes, the most
common being (using DTD terminology for
simplicity):
ID attribute values must be XML Names (no spaces; must begin with a letter) and they must be unique in a document. An IDREF attribute value can occur any number of times, but it must be the value of an ID attribute in the same document. ID and IDREF are most frequently used for cross-referencing within documents.
Note that an ID attribute can have any name: it doesn't have to be called ID, although it frequently is. Conversely—as a matter of best practice—you should never use the name ID (id) for an attribute which is not of type ID, simply because it's confusing. If your application has unique identity values that the community calls IDs, and which are not XML Names, either name the attribute something different (eg Product-ID) or document heavily that the value is not an XML ID.
There is a W3C
Recommendation that document type designers should
use the attribute name xml:id, and this can be
interpreted by parsers as being a unique ID without the
need for the document to use a DTD or Schema.
Just text.
The attribute must have one of a restricted number of values (specified in parentheses in the declaration, separated by vertical bars), eg
<!ATTLIST part level (App|Jny|Mst) #REQUIRED> <!ATTLIST Q.27 resp (Yes|No) "Yes">
In the first example there is no default, and a value is compulsory. In the second, Yes is the default value (if the attribute is omitted, the parser will take the default value from the declaration).
The attribute value must be a declared Entity.
An XML Name Token is like an ID value (no spaces) but it can begin with a non-letter (eg a digit or punctuation).
In addition to xml:id (mentioned above),
there are two others allowed by the XML
Specification:
to signal an intention that in that element, white space should be preserved by applications;
to specify the language used in the contents and attribute values of any element.
See sections 2.10 and 2.12 of the Spec for more detail.
In Schemas a much greater range of datatypes is available than in DTDs, and complex validation criteria can be attached to each.
Attributes in a DTD can be declared as #REQUIRED (compulsory),
#IMPLIED (optional),
or #FIXED (predefined
and invariable).
There is not intended to be any limit on the length of an attribute value, but you should check that your processing software can handle unusual data volumes if you intend to use very large lengths.
BPEL
Byte Order Mark
Colour
Data
export
Data
import
DisadvantagesIt can be verbose unless element and attribute names are chosen with care. In large documents the markup overhead need not be large, but in short messages it can be significantly more than the actual data, especially when the element or attribute names are concocted by machine.
Overlapping markup is not permitted (an element cannot start inside one element and end inside another): element markup must nest hierarchically.
Some of the software is truly mediocre.
Editing
Entitieswhich can be like string-replacement macros:
<!ENTITY IBM "International Business Machines">
These can be used for shorthand data entry or to guarantee uniform spelling like &IBM; and they get replaced when the file is parsed.
They can also represent external files:
<!ENTITY chap5 SYSTEM "chapter5.xml">
which can be used as a file-inclusion mechanism at the point where you insert &chap5;. External general file entities must not contain the XML Declaration or any Document Type Declaration.
These are like external general file entities except that they specify the type of data they contain, using a declared Notation, so that the parser and application can decide how to handle them (eg include them or hand them to another program specific to their type of medium):
<!ELEMENT link (#PCDATA)>
<!ATTLIST link to ENTITY #REQUIRED>
...
<!NOTATION PDF PUBLIC
"-//Adobe//NOTATION Portable Document Format//EN//PDF"
"http://partners.adobe.com/public/developer/pdf/index_reference.html">
<!ENTITY pricelist SYSTEM "/sales/pricelist.pdf" NDATA PDF>
...
<para>Please refer to our <link to="pricelist">current
price list</link>.</para>
This provides an extremely robust method of defining an external entity once and allowing it to be referenced multiple times (if the external filename changes, you only have to update the entity declaration).
like á to represent characters that users without the required keyboard features may want to enter like á;
are like General Entities but can only be referenced within a DTD. They are used for control of content models, inclusion or exclusion of declarations, and modification of modular constructs:
<!ENTITY % local.qandaset.mix "|bibliodiv">
(to use an example from the DTD for this FAQ)
where the mix of element types in the content model for
qandaset is specified by the entities
qandaset.mix (defined by DocBook)
and by
local.qandaset.mix (definable by the
user [me]) so that the DTD can be tweaked without having
to be edited.
General entity names, including XML document entities and character entities, always start with an ampersand (&) and end with a semicolon (;), and can be used anywhere in your document. Parameter entities can only be used in a DTD: they start with a percent sign (%) and end with a semicolon.
Enumeration<xsl:value-of select="count(//chapter)"/>
To apply a counter to a repetitive element type, use the xsl:number element, eg
<xsl:number select="appendix" level="any" format="A"/>
For more on XSLT, see question C.24, ‘How do I control formatting and appearance?’.
Environment variables
Escapingá.
XML allows you to use Unicode, so any character or symbol in any language can be entered as itself. If you are using UTF-8 encoding in your documents, there is no need to use escaping except for the two markup symbols (< and &). However, not everyone has a Unicode editor, and complete Unicode fonts are very large, so it is conventional in alphabetic languages to pick an encoding which allows you to use the majority of the characters you need, and to use escaping for the occasional other characters.
Floating-point
GTT
Games
Idempotency
Javascript
Line breaksLine-breaking in your output is governed by your rendering engine (eg a browser, a typesetter, etc). Your DTD or Schema may define special elements or entities to be used on rare occasions when a forced linebreak is required, but this is not normally something done in XML (exception: reconstruction of historical documents using the TEI).
Loops
<xsl:for-each select="//chapter">
<li>
<xsl:value-of select="title"/>
</li>
</xsl:for-each>
Multimedia
Patents, Copyright,
and Intellectual PropertySince the USA (and, increasingly, elsewhere) stopped sanity-checking patent applications, pretty much anyone can patent anything in these countries, regardless of whether or not it already exists. If you are sufficiently intellectually bankrupt, you can then start sending invoices to companies and even individuals demanding payment of license fees for continued use.
XML was drafted during 1995 and first published in 1996, so anyone claiming they invented pointy-bracket self-defining hierarchically-nested structured markup after that is probably a few elements short of a DTD. XML is based on SGML, which is an international standard codified as ISO 8879:1986, and it was preceded by numerous other closely-related markup systems, so anyone claiming they invented it after that date is equally wide of the markup.
Lots of subsequent derivative technologies which owe their existence to the SGML and XML groundwork quite possibly are valid patents, in the same way that fire was not originally patented but matches and lighters were.
Patents were originally designed for new physical inventions. Their use for methodologies and algorithms extended the concept into the realm of ideas, which many people regard as deeply suspect. The patenting of natural phenomena like genes (which are pre-existing parts of Nature like politicians or pond scum), is meaningless and intellectually void, although legally enforceable in the USA and elsewhere.
Copyright subsists automatically in anything you create, but in some countries (notably the USA and France) you cannot enforce this unless you register your interest. Copyright persists for a number of years after your death (EU: 75, different elsewhere) in order to let your descendants benefit from sales of your work.
Copyright is for the physical form of intellectual expression like books, newspapers, works of art, web sites, or computer programs. It exists to prevent others stealing your work and selling it. You can quote snippets of other people's work without permission, such as a line of a poem, or a bar of music, or a sentence from a novel, provided you say whose it is and where to find it: otherwise you need to ask permission beforehand. Copyright already provides more than adequate protection for computer programs, making the use of patents for them unnecessary overkill.
Intellectual Property identifies you as the owner of the thoughts and ideas which may find their physical manifestation in patentable inventions or copyrightable publications. Even if you sell off your patents, and for long after your copyrights have expired, you can still be seen as the person who dreamed up the idea, and some countries (eg the UK) allow you formally to assert your right to be so identified, regardless of what happens to the book or the gizzmo.
You should always acknowledge the intellectual property of others, especially when you use it in furtherance of your own aims. Pretending that someone else's smart ideas are your own is probably a worse offence than trying to patent fire, water, the wheel, or XML.
PipeliningThe W3C has a Note pending submission on an XML Pipeline Definition Language which could be used to define a pipeline in a portable, vendor-independent manner.
RSSNewsreaders (RSS readers) are available for all platforms, both standalone and as browser plugins. Do not confuse these with programs of the same description designed to provide access to the Usenet News service, which is a different thing entirely (and which you will need to read comp.text.xml).
Rendering
SMLThe Standard ML programming language is not.
Did you mean SGML?
SOAPOriginally the Simple Object Access Protocol, the acronym is now undefined, or expressed as the Service-Oriented Access Protocol.
SearchingXSLT allows a limited search facility simply by using functions like contains, starts-with, and ends-with. XSLT2 adds Regular Expressions, but this is not yet (2005) a Recommendation. XQuery is a fully-fledged search language for XML.
The Saxon XSLT processor comes with an implementation of XQuery (see also the XQL FAQ), which can accept queries either from the command line or from a file. Saxon can also use a control file to specify groups of XML files to be searched together.
For indexed searching (for speed) you need an XQuery search tool that implements an indexing engine which reads and understands markup. These are usually implemented as part of a native XML database system such as eXist (and many others), which run either stand-alone or in parallel with an XML server like Cocoon.
Traditional relational databases (MySQL, Oracle, etc) tend to store XML as undistinguished strings or BLOBs, using bolt-on XML-like backends to disambiguate the markup. Native XML databases can be configured for granularity, to store at a specific element level, making markup-sensitive searching much easier.
Serving
XML
Sorting<xsl:for-each select="//acronym"> <xsl:sort select="@abbrev"/> <xsl:value-of select="@abbrev"/> <xsl:text>: </xsl:text> <xsl:apply-templates/> </xsl:for-each>
Special
charactersThe open angle bracket or less-than sign (<) which begins a start-tag or end-tag like <report> or </table>;
The ampersand character (&) which starts an entity reference like á for á or § for §.
Contrary to popular opinion, the closing angle bracket or greater-than (>) and the semicolon (;) are not special characters in normal text: they only acquire their temporary special meaning once one of the two markup characters has been encountered.
In DTDs, the percent sign (%) has a special meaning in entity declarations: it defines the entity as a parameter entity, meaning that it can only be used inside the DTD, not in a document text, and only for data substitution (a kind of simple macro).
The exclamation mark (!) acquires a special meaning immediately after a less-than sign: when followed by one of the declaration keywords in a DTD it signals the start of Declaration; when followed by two dashes it signals the start of a comment (ended by another two dashes and a greater-than sign.
TMX
TablesHTML tables were invented by Mosaic (now Netscape) and first appeared in the HTML2 DTD. In all versions of HTML and XHTML they define a very simple but practical model, with very few refinements, suitable for web use and for rudimentary printing. Their chief advantage is that in a browser the cell heights and widths (and thus the column widths) expand or contract automatically to accommodate the amount of text contained in them. Most other table models assume the widths of the columns and the height of the cells will be specified in advance (which you can do in HTML but this is rarely used).
Computer-Aided Logistics and Support (and several other acronyms over the years) was (is) part of the US military project to ensure a consistent markup for all documentation, originally in SGML, now in XML. As part of this activity the CALS table model has become the most widely-used in technical documentation, especially for Interactive Electronic Technical Manuals (IETMs), with extensive support in all the major editors, and it is the default table model in the DocBook DTD and Schema. The CALS definitions are very powerful but quite complex, and can handle virtually all requirements for spanning, ruling, and aligning.
This model has been used extensively in the social sciences and elsewhere for defining tables based on the semantics of the data, rather than the appearance. At one time they were an alternative in DocBook (enabled by a simple parameter entity switch).
The TEI model is designed to allow the encoder to represent existing tables being transcribed from historical, literary, or archive material, rather than for the generation of new data. The markup is at the same level of simplicity as the HTML model, but it is designed to allow the inclusion of the much denser markup and metadata needed in research texts.
The LATEX model is not of direct concern to the XML user except insofar as LATEX is a common target for transformations from XML using XSLT. Like CALS, LATEX tables can handle almost any formatting, but the default alignments assume that each column format is defined beforehand, and that each cell will occupy one line of data: an additional package (array) is needed to handle multi-line cells in the way the HTML model does.
In XML, it is not necessary to use tables to mark up
lists as is often done in wordprocessors, because the
processing facilities of languages like XSLT allow you to
transform the document to use non-tabular methods (like
HTML's divs). Table markup should
therefore be confined to real
tables (data arranged in rows and columns) and not abused
simply because you want something displayed on a level
with something else: it is better to pick markup which is
designed to do the job properly rather than to distort
existing facilities.
Wordprocessor users are usually unaware that many structures that they currently use wordprocessor tables for are in fact segmented lists, which wordprocessors are incapable of handling correctly. One of the major reasons for doing it properly is that the data can then be reprocessed to make sense when read in the natural order.
Text
document formatting functionsThere are additional native-XML proposals and recommendations at the W3C for XML Forms handling, XML Linking, XML Security, and a lot of other features, but these are architectural enabling mechanisms, not drop-in replacements for HTML.
UML
URI parsing errors
VariablesXML identifies your information with elements and attributes.
WAP
Well-formed
White-space
XLL
XLSDo not confuse XLS with XSL (see question C.24, ‘How do I control formatting and appearance?’).
XML
XML Protocol
XMLHTTP
XUL
asp.net.NET itself is an application platform and methodology for web services development on Microsoft servers. Most web services are predicated on XML as the common carrier of inter-business messaging, so .NET has a significant XML component.
There are many alternatives to ASP, most of which use a similar page based approach. Java based alternatives include Java Server Pages (JSP), Java Server Faces (JSF) and Cocoon (which includes eXtensible Server Pages—XSP). Popular scripting language alternatives include AxKit (Perl, also supporting XSP), Zope (Python) and Rails (Ruby) [all of which have extensive XML support.—Ed.]