Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 4: Developers

Q 4.24: How do I flatten a complex DTD?

Try some of the tools listed here, or you can edit it by hand

DTDs can be constructed from multiple files, and they can use entities to expand (or not) sets of declarations as and when required, and they can cause sections to be included or ignored depending on circumstances. This is most common in very large and flexible DTDs such as the TEI, but it is also true of a great many standard industrial DTDs which are made from pluggable components.

Some software won’t accept a complex DTD construction, so the only solution is to ‘flatten’ the structure, expanding any entities as needed, including any external files required, and turning various portions on or off according to needs. The resulting monolithic DTD then lacks the flexibility of the componentised original, but it can be read by even relatively simple XML (or SGML) applications.

dtdflatten

This is a Java command-line tool at https://github.com/ncbi/DtdAnalyzer/blob/master/src/gov/ncbi/pmc/dtdanalyzer/DtdFlatten.java which is part of the DtdAnalyzer suite at https://github.com/ncbi/DtdAnalyzer/blob/master/src/gov/ncbi/pmc/dtdanalyzer/DtdFlatten.java.

Another version is at https://github.com/Klortho/DtdAnalyzer with more recent commits (thank you to Liam Quin)

Carthage

The Carthage software is at https://github.com/TEIC/Carthage. It consists of a yacc/lex-based parser for SGML DTDs which can delete references to undeclared elements. It can also do a few other things, depending on the run-time flags you give it, basically keeping or dropping certain classes of component.

Carthage was a product of the TEI Consortium, originally written by Michael Sperberg-McQueen but now unmaintained. The name appears to derive from a pun on Cato’s exhortation in ancient times that ‘Carthage must be destroyed’ (thank you to Syd Bauman).

Carthage is unsupported software; it may be used freely without further permission or royalty, but users who improve it or fix errors are requested to notify the author so he can also fix them.

NormDTD

This was a public domain DOS program written by Richard Light to handle those occasions where an SGML system cannot accept the complexity of large DTDs with deeply nested marked sections and parameter entity references to external files.

It flattens the DTD to a single file, duplicating where necessary all the references that were previously handled by parameter entities. The element content models in this normalized DTD will not contain any references to elements that are not declared, and so it can be used by highly-strung packages that refuse to process such applications (the TEI in particular) for this reason.

The editor retains a copy of the DOS binary which was on the CD accompanying Understanding SGML and XML Tools (Flynn, 1998).

spam

(Nothing whatever to do with the more recent use of the word to describe unwanted advertising.)

spam (SP Add Markup) is a command-line stream editor for DTDs by James Clark. There are options to obey IGNORE and INCLUDE Marked Sections; to output the Prolog (SGML Declaration and DTD), and to expands any entity references between declarations.

spam is included in the SP package. The ospam binary is still shipped in the (now) OpenSP package, part of the OpenJade distribution from https://openjade.sourceforge.net.

Near&Far

Famous for its world-beating graphical interface to DTD design, N&F can flatten the complexity in importing a DTD, but cannot retain that structure on export. DOS only, and no longer available, although it is still in use in some places. For more details, see the topic ‘Near&Far (MicroStar)’ in question E.4 on ‘Lost XML software’;

dtd2xml

Andrew Sales added that the dtd2xml utility can be used in conjunction with XSLT (provided) to write code to flatten an XML instance conforming to that DTD.

This question was triggered by a request on the XSL-List.