Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 4: Developers

Q 4.20: I've already got SGML DTDs: how do I convert them for use with XML?

Edit by hand or use software like Near+Far Designer.

There are numerous projects to convert common or popular SGML DTDs to XML format (for example, both the TEI DTD (Lite and full versions) and the DocBook DTD are available in both SGML and XML, in Schema and DTD formats).

Seán McGrath writes:

To convert SGML DTDs to XML:

  1. No equivalent of the SGML Declaration. So keywords, character set etc are essentially fixed;

  2. Tag minimisation is not allowed, so <!ELEMENT x - O (A,B)> becomes <!ELEMENT X (A,B)> and <!ELEMENT x - O EMPTY> becomes <!ELEMENT X EMPTY>;

  3. #PCDATA must only occur at the extreme left (ie first) in an OR model, eg <!ELEMENT x - - (A|B|#PCDATA|C)> (in SGML) becomes <!ELEMENT x (#PCDATA|A|B|C)*>, and <!ELEMENT x (A,#PCDATA)> is illegal;

  4. No CDATA, RCDATA elements [declared content];

  5. Some SGML attribute types are not allowed in XML eg NUTOKEN;

  6. Some SGML attribute defaults are not allowed in XML eg CONREF and CURRENT;

  7. Comments cannot be inline to declarations like

    <!ELEMENT x - - (A,B) -- an SGML comment in a declaration -->
    		  
  8. A whole bunch of SGML optional features are not present in XML: all forms of tag minimisation (OMITTAG, DATATAG, SHORTREF, etc); Link Process Definitions; Multiple DTDs per document; and many more: see http://www.w3.org/TR/NOTE-sgml-xml-971215 for the list of bits of SGML that were removed for XML;

  9. And [nearly] last but not least, no CONCUR!

  10. There are some important differences between the internal and external subset portion of a DTD in XML: Marked Sections can only occur in the external subset; and Parameter Entities must be used to replace entire declarations in the internal subset portion of a DTD, eg the following is invalid XML:

     
    <!DOCTYPE x [ 
    <!ENTITY % modelx "(A|B)*"> 
    <!ELEMENT x %modelx;> 
    ]> 
    <x></x>
    		

For more information, see McGrath (1998).