Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 2: Existing users

Q 2.5: How do I control the formatting of XML?

Use CSS or an XSLT2 stylesheet.

In HTML, default styling was built into the browsers because the tagset of HTML was predefined and hardwired into browsers. This is still true for XHTML and HTML5 to some extent. In other XML, where you can define your own tagset, browsers cannot possibly be expected to guess or know in advance what names you are going to use and what they will mean, so you need a stylesheet if you want to display formatted text.

Browsers which read XML will accept and use a CSS stylesheet at a minimum, but you can also use the more powerful XSLT stylesheet language to transform your XML into HTML — which browsers, of course, already know how to display (and that HTML can still use a CSS stylesheet). This way you get all the document management benefits of using XML, but you don't have to worry about your readers needing XML smarts in their browsers.

This transformation is usually done by the document owner, on their server, so you just get the HTML anyway, possibly unaware that it was XML originally. But it is also possible to use the (rather limited) built-in XSLT 1.0 transformer in some browsers, and server operators can now also use Saxon CE, which is a downloadable in-browser version of XSLT2.

Mike Brown writes:

XSLT is an XML document processing language that uses source code that happens to be written in XML. An XSLT document declares a set of rules for an XSLT processor to use when interpreting the contents of an XML document. These rules tell the XSLT processor how to generate a new XML-like data structure and how that data should be emitted — as an XML document, as an HTML document, as plain text, or perhaps in some other format.

This transformation can be done either inside the browser, or by the server before the file is sent. Transformation in the browser offloads the processing from the server, but may introduce browser dependencies, leading to some of your readers being excluded. Transformation in the server makes the process browser-independent, but places a heavier processing load on the server.

As with any system where files can be viewed at random by arbitrary users, the author cannot know what resources (such as fonts) are on the user's system, so the same care is needed as with HTML using fonts. To invoke a stylesheet from an XML file for standalone processing in the browser, include one of the stylesheet declarations:

 
<?xml-stylesheet href="foo.xsl" type="text/xsl"?> 
<?xml-stylesheet href="foo.css" type="text/css"?> 
	  

(substituting the URI of your stylesheet, of course). See http://www.w3.org/TR/xml-stylesheet/ for the full details. The Cascading Stylesheet Specification (CSS) provides a simple syntax for assigning styles to elements, and has been implemented in most browsers.

Dave Pawson maintains a comprehensive XSL FAQ at http://www.dpawson.co.uk/xsl/, and his book (Pawson, 2002) [the Fox book] is available from O'Reilly. XSL uses XML syntax (an XSL stylesheet is just an XML file) and has widespread support from several major browser vendors (see the questions on browsers and other software). XSL comes in two flavours:

  • XSL itself, which is a pure formatting language, outputting a Formatted Objects (FO) file, which needs a text formatter like FOP, XEP, or others to create printable (PDF) output (but see Alternatives to XSL:FO). Currently I am not aware of any Web browsers which support direct XSL rendering to PDF;

  • XSLT (T for Transformation), which is a language to specify transformations of XML into HTML either inside the browser or at the server before transmission. It can also specify transformations from one vocabulary of XML to another, and from XML to plaintext (which can be any format, including RTF and LATEX).

All current versions of Microsoft Internet Explorer, Firefox, Chrome, Mozilla, Safari, and Opera handle XSLT 1.0 inside the browser. Beware obsolete browsers like MSIE5.5 which needs some post-installation surgery to remove the long-obsolete WD-xsl and replace it with the current XSL-Transform processor.

WYSIWYG for XSL

There have been attempts to produce pseudo-WYSIWYG editors for creating XSL[T] stylesheets, but they have mostly been restricted to simple mapping between input elements and output elements (eg a DocBook para to a HTML p). Anything beyond this seems likely to fail because of the infinite complexity of what people want to do with their information. If you have access to the ACM database, see the paper by Pietriga, Vion-Dury, and Quint on VXT, from the ACM DocEng'01 (Atlanta) Proceedings.

Generating HTML on the server

There is a growing use of server-side processors like Cocoon and others, which let you create, store, and manage your information in XML but serve it auto-converted to HTML or some other format, thus allowing the output to be used by any browser. XSLT is also widely used to transform XML into non-SGML formats for input to other systems (for example to transform XML into LATEX for typesetting).

Alternatives to XSL:FO

Instead of generating PDF via an FO processor, it is possible to use XSLT2 to transform XML to LATEX for typesetting PDF (as is done for the print versions of this FAQ, from DocBook to LATEX). This has the advantage of being able to make use of LATEX's extensive library of prewritten formatting modules (‘packages’), which avoids much of the wheel-reinventing currently required with XSL:FO.

Alternatively, David Carlisle's xmltex reads XML directly, offering another practical if experimental solution to typesetting XML. One use of a TEX system that can typeset XML files is as a backend processor for XSL:FO, serialised as XML. Sebastian Rahtz's PassiveTEX uses xmltex to achieve this end.

The TEX FAQ is at http://www.tex.ac.uk/faq. Silmaril maintains the online version of Peter Flynn's book on LATEX, Formatting Information, which has some examples of XSLT2 conversion (Flynn, 2014).

SGML systems used a similar stylesheet mechanism: some of the common ones were the FOSI (Formatted Output Specification Instance), which was standard in defence and industrial engineering applications, especially when using the Arbortext editor (Adept, then Epic, probably something else next week); the DynaText/DynaWeb stylesheet used in SGML publishing to the web; and the Synex stylesheet used in browsers based on the Synex engine (eg Panorama, whose styling interface was partly adopted in XMetaL), the expertise of whose designers persists in the DocZilla browser.