Related
s
DOM
HTML
HTML5
MSXML
NAMESPACES
NOT SGML
SAX
SCHEMA
SGML
SVG
TEX
UNICODE
XML CHINESE
XML CONDENSED
XML DUTCH
XSL
C.16 I keep hearing about alternatives to DTDs. What's a
Schema?
The W3C XML Schema recommendation provides a means of specifying formal data typing and validation of element content in terms of data types, so that document type designers can provide criteria for checking the data content of elements as well as the markup itself. Schemas are written in XML Document Syntax, like XML documents are, avoiding the need for processing software to be able to read XML Declaration Syntax (used for DTDs).
There is a separate Schema FAQ at http://www.schemavalid.com
.
The term ‘vocabulary’ is sometimes used to
refer to DTDs and Schemas together. Schemas are aimed at
e-commerce, data control, and database-style applications
where character data content requires validation and where
stricter data control is needed than is possible with DTDs;
or where strong data typing is required. They are usually
unnecessary for traditional text document publishing
applications.
Unlike DTDs, Schemas cannot be specified in an XML Document Type Declaration. They can be specified in a Namespace, where Schema-aware software should pick it up, but this is optional:
<invoice id="abc123"
xmlns="http://example.org/ns/books/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://acme.wilycoyote.org/xsd/invoice.xsd">
...
</invoice>
More commonly, you specify the Schema in your processing software, which should record separately which Schema is used by which XML document instance.
In contrast to the complexity of the W3C Schema model, Relax NG is a lightweight, easy-to-use XML schema language devised by James Clark (see http://relaxng.org/) with development hosted by OASIS. It allows similar richness of expression and the use of XML as its syntax, but it provides an additional, simplified, syntax which is easier to use for those accustomed to DTDs.
Authors and publishers should note that the English plural of Schema is Schemas: the use of the singular to do duty for the plural is a foible dear to the semi-literate; the use of the old (Greek) plural schemata is unnecessary didacticism.
Writers should also note that the plural of DTD is DTDs: there is no apostrophe—see Eats, Shoots & Leaves: The Zero-Tolerance Approach to Punctuation (Truss, 2003).
Many XML developers were dissatisfied with the syntax of the markup declarations described in the XML spec for two reasons. First, they felt that if XML documents were so good at describing structured information, then the description of a document type's own structure (its schema) should be in an XML document instead of written with its own special syntax. In addition to being more consistent, this would make it easier to edit and manipulate the schema with regular document manipulation tools. Secondly, they felt that traditional DTD notation didn't allow document type designers the power to impose enough constraints on the data—for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices. This eases the development of software using that data because the developer has less error-checking code to write.
A DTD is only for specifying the element structure of an XML file, with a very limited amount of control over attribute values. It gives the names of the elements, attributes, and entities that can be used, and how they fit together. DTDs are designed for use with traditional text documents, not rectangular or tabular data, so the concept of data types is not relevant: text is just text. If you need to specify numeric ranges or to define limitations or checks on the character data (text) content, a DTD is the wrong tool.