The XML FAQ: I keep hearing about alternatives to DTDs. What's a Schema?

Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 3: Authors

Q 3.14: I keep hearing about alternatives to DTDs. What's a Schema?

Like a DTD for validating content as well as structure.

The W3C XML Schema recommendation provides a means of specifying formal data typing and validation of element content in terms of data types, so that document type designers can provide criteria for checking the data content of elements as well as the markup itself. Schemas are written in XML Document Syntax, like XML documents are, avoiding the need for processing software to be able to read XML Declaration Syntax (used for DTDs).

There is a separate Schema FAQ at http://schema.org/docs/faq.html The term ‘vocabulary’ is sometimes used to refer to DTDs and Schemas together. Schemas are aimed at e-commerce, data control, and database-style applications where character data content requires validation and where stricter data control is needed than is possible with DTDs; or where strong data typing is required. They are usually unnecessary for traditional text document publishing applications, where DTDs continue to be used.

Unlike DTDs, Schemas cannot be specified in an XML Document Type Declaration. They can be specified in a Namespace, where Schema-aware software should pick it up, but this is optional:

<?xml version="1.0"?>
<invoice xml:id="abc123"
         xmlns="http://example.org/ns/books/"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://acme.wilycoyote.org/xsd/invoice.xsd">
...
</invoice>

More commonly, you specify the Schema in your processing software, which should record separately which Schema is used by which XML document instance.

In contrast to the complexity of the W3C Schema model, Relax NG is a lightweight, easy-to-use XML schema language devised by James Clark (see http://relaxng.org/) with development hosted by OASIS. It allows similar richness of expression and the use of XML as its syntax, but it provides an additional simplified syntax which is easier to use for those accustomed to DTDs.

Authors and publishers should note that the English plural of Schema is Schemas: the use of the singular to do duty for the plural is a foible dear to the semi-literate; the use of the old (Greek) plural schemata is unnecessary didacticism.
Writers should also note that the plural of DTD is DTDs: there is no apostrophe — see Truss (2003).

Bob DuCharme writes:

Many XML developers were dissatisfied with the syntax of the markup declarations described in the XML spec for two reasons. First, they felt that if XML documents were so good at describing structured information, then the description of a document type's own structure (its schema) should be in an XML document instead of written with its own special syntax. In addition to being more consistent, this would make it easier to edit and manipulate the schema with regular document manipulation tools. Secondly, they felt that traditional DTD notation didn't allow document type designers the power to impose enough constraints on the data — for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices. This eases the development of software using that data because the developer has less error-checking code to write.

Peter Flynn writes:

A DTD is only for specifying the element structure of an XML file, with a very limited amount of control over attribute values. It gives the names of the elements, attributes, and entities that can be used, and how they fit together. DTDs are designed for use with traditional text documents, not rectangular or tabular data, so the concept of data types is not as relevant: text is just text. If you need to specify numeric ranges or to define limitations or checks on the character data (text) content, a DTD is the wrong tool.

G Ken Holman writes:

Schemas as constraint languages
My perspective is that any schema language is a constraint language. When you say ‘Determining if an instance is valid is a matter of checking that the instance adheres to the specification,’ I would say ‘Determining if an instance is valid is a matter of checking that the instance does not violate any of the schema’s constraints.’ And one can say ‘An authoring tool directs the creation of content that does not violate any of the schema’s constraints.’"
Different uses put different spins on the activity, but at the core the activity is guided by sets of constraints. That way you can have many sets of constraints using different technologies (W3C XML Schema, Schematron, etc), and then selectively decide if an instance is valid against any or all sets of constraints. RELAX-NG and DTDs define constraints as pattern grammars, W3C XML Schema defines constraints as type hierarchies, and Schematron defines constraints as content assertions.