Copyright © 2010 Silmaril Consultants
Rev: 2010-03-01T00:19:00+0000

Use a CSS or XSLT stylesheet.C.24  How do I control formatting and appearance?

In HTML, default styling was built into the browsers because the tagset of HTML was predefined and hardwired into browsers. In XML, where you can define your own tagset, browsers cannot possibly be expected to guess or know in advance what names you are going to use and what they will mean, so you need a stylesheet if you want to display formatted text.

Browsers which read XML will accept and use a CSS stylesheet at a minimum, but you can also use the more powerful XSLT stylesheet language to transform your XML into HTML—which browsers, of course, already know how to display (and that HTML can still use a CSS stylesheet). This way you get all the document management benefits of using XML, but you don't have to worry about your readers needing XML smarts in their browsers.

Mike Brown writes:

XSLT is an XML document processing language that uses source code that happens to be written in XML. An XSLT document declares a set of rules for an XSLT processor to use when interpreting the contents of an XML document. These rules tell the XSLT processor how to generate a new XML-like data structure and how that data should be emitted—as an XML document, as an HTML document, as plain text, or perhaps in some other format.

This transformation can be done either inside the browser, or by the server before the file is sent. Transformation in the browser offloads the processing from the server, but may introduce browser dependencies, leading to some of your readers being excluded. Transformation in the server makes the process browser-independent, but places a heavier processing load on the server.

As with any system where files can be viewed at random by arbitrary users, the author cannot know what resources (such as fonts) are on the user's system, so the same care is needed as with HTML using fonts. To invoke a stylesheet from an XML file for standalone processing in the browser, include one of the stylesheet declarations:

 
<?xml-stylesheet href="foo.xsl" type="text/xsl"?> 
<?xml-stylesheet href="foo.css" type="text/css"?> 
	  

(substituting the URI of your stylesheet, of course). See http://www.w3.org/TR/xml-stylesheet/ for the full details. The Cascading Stylesheet Specification (CSS) provides a simple syntax for assigning styles to elements, and has been implemented in most browsers.

Dave Pawson maintains a comprehensive XSL FAQ at http://www.dpawson.co.uk/xsl/FAQ, and his book XSL-FO: Making XML Look Good in Print (Pawson, 2002) [the Fox book] is available from O'Reilly. XSL uses XML syntax (an XSL stylesheet is just an XML file) and has widespread support from several major browser vendors (see the questions on browsers and other software). XSL comes in two flavours:

Currently only Microsoft Internet Explorer 5.5 and above, and Firefox 0.9.6 and above handle XSLT inside the browser (MSIE5.5 needs some post-installation surgery to remove the obsolete WD-xsl and replace it with the current XSL-Transform processor; MSIE6 and Firefox work as installed).

WYSIWYG for XSL

There have been attempts to produce pseudo-WYSIWYG editors for creating XSL[T] stylesheets, but they have mostly been restricted to simple mapping between input elements and output elements (eg a DocBook para to a HTML p). Anything beyond this seems likely to fail because of the infinite complexity of what people want to do with their information. If you have access to the ACM database, see the paper by Pietriga, Vion-Dury, and Quint on VXT, from the ACM DocEng'01 (Atlanta) Proceedings.

 

Generating HTML on the server

There is a growing use of server-side processors like Cocoon, AxKit, PropelX, and others, which let you create, store, and manage your information in XML but serve it auto-converted to HTML or some other format, thus allowing the output to be used by any browser. XSLT is also widely used to transform XML into non-SGML formats for input to other systems (for example to transform XML into LATEX for typesetting).

 

Alternatives to XSL:FO

Instead of generating PDF via an FO processor, it is possible to use XSLT to transform XML to LATEX for typesetting PDF (as is done for the print versions of this FAQ, from DocBook to LATEX). This has the advantage of being able to make use of LATEX's extensive library of prewritten formatting modules (‘packages’), which avoids much of the wheel-reinventing currently required with XSL:FO.

Alternatively, David Carlisle's xmltex reads XML directly, offering another practical if experimental solution to typesetting XML. One use of a TEX system that can typeset XML files is as a backend processor for XSL:FO, serialized as XML. Sebastian Rahtz's PassiveTEX uses xmltex to achieve this end.

The TEX FAQ is at http://www.tex.ac.uk/faqFAQ.

SGML systems used a similar stylesheet mechanism: some of the common ones were the FOSI (Formatted Output Specification Instance), which was standard in defence and industrial engineering applications, especially when using the Arbortext editor (Adept, now Epic); the DynaText/DynaWeb stylesheet used in SGML publishing to the web; and the Synex stylesheet used in browsers based on the Synex engine (eg Panorama, whose styling interface was partly adopted in XMetaL), the expertise of whose designers persists in the DocZilla browser.