The XML FAQ: What has changed between SGML and XML?

Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 4: Developers

Q 4.6: What has changed between SGML and XML?

Stricter syntax and no options.

The main syntactic change is that EMPTY elements in DTDless documents must use the Null End-Tag trick (eg <img src="pic"/>) because without a DTD or Schema there is no way for the parser to know not to expect an end-tag. If an element type is declared as EMPTY in the DTD/Schema then it can use either the NET or the full end-tag syntax (eg <img src="pic"> </img>).

Other syntactic changes are that all attribute values must be quoted; there is no minimisation of attributes or elements; and everything is case-sensitive. One important addition is that multiple ATTLIST declarations are allowed, so an internal subset can add to the attributes already declared for an element type.

The principal changes in Document Type Definitions (DTDs) are in what you can specify. To simplify it and make it easier to write processing software, a large number of SGML markup declaration options have been suppressed (see the list of omitted features). The biggest change in vocabulary management is the introduction of W3C Schemas, which allow a level of content-type validation not available in DTDs, and are themselves expressed in XML Document Syntax.

The main addition here is namespaces, which enable Schemas and documents to distinguish element-type and attribute-type source (ownership, origin, or application). This lets you have element types with the same name but different meanings in the same document, eg DocBook:table and TEI:table. An extra Name Start Character (the colon) was added in XML Names to allow this. Despite its classification, a colon may only appear in mid-name, not at the start or the end, and the prefix xml: is Reserved.