Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 3: Authors

Q 3.6: How do I convert XML to other file formats?

Write a conversion in a language that understands XML

While it is possible to write conversion routines by inventing your own XML parser, it is not recommended except as an exercise for students of computing science. All major languages have XML libraries that do all the heavy lifting of parsing (and validating, if needed).

You do need to know what's in the XML document before you start: there is no magic wand that will automatically deduce what things mean and where they are located in the file. If you have been handed some XML files out of the blue, you will need to go and find the creator or some documentation about them. The first 2–3 lines of the file may hold a clue as to what type of XML they are. You will almost certainly need a copy of the DTD or Schema to which the files have been created.

The options for programming are:

  • Use a language designed for the task. XSLT2 has all the facilities for handling XML built in from the start, and standalone processors are available for all platforms. Many XML editors have a copy of XSLT (XSLT2, hopefully) built in, so they offer an integrated development environment for editing and conversion. XSLT2 conversion can also run inside server packages like Apache Cocoon.

  • Use an XML processing or pipelining package. These are (usually) commercial products which provide extensive document management, document database, and document conversion and editing functions, often as part of a much larger enterprise information solution, using XSLT2 or their own in-house systems. Two popular ones are MarkLogic and OmniMark.

  • For data, use a conversion system that does not require writing code: Flexter is an example of one with a graphical interface for mapping source elements (XML) to target fields (several formats). While this approach is not appropriate for ‘document’ XML (books, articles, etc) it provides a useful method for tabular ‘data’-type XML of arbitrary complexity.

  • Use a conventional compilable language. Java or C (or one of its many ++/♯ variants) would be common; Pascal, FORTRAN, or COBOL are rare these days, but XML libraries do exist for them). BASIC, anyone?

  • Use a scripting language. Perl, Python, Tcl, VBscript, or even Powershell are all popular, and XML libraries exist for them; the Python ones have an excellent reputation.

  • Combine XML utilities with standard shell command utilities. Here is an early example of an XML-to-CSV routine which uses onsgmls to expose the ESIS, and awk to reformat it. Similar processes can be developed using the LTXML2 toolkit.

  • There are downloadable (sometimes free) programs claiming to be ‘easy’ XML converters. The editor would like to hear recommendations or warnings ☺.

The process of converting XML to other formats is sometimes referred to as ‘down-converting’, as it may involve the unavoidable loss of information (usually metadata) when the target format simply doesn't have a way to represent it.