Related
s
DOM
HTML
HTML5
MSXML
NAMESPACES
NOT SGML
SAX
SCHEMA
SGML
SVG
TEX
UNICODE
XML CHINESE
XML CONDENSED
XML DUTCH
XSL
C.24 How do I control formatting and appearance?
In HTML, default styling was built into the browsers because the tagset of HTML was predefined and hardwired into browsers. In XML, where you can define your own tagset, browsers cannot possibly be expected to guess or know in advance what names you are going to use and what they will mean, so you need a stylesheet if you want to display formatted text.
Browsers which read XML will accept and use a CSS stylesheet at a minimum, but you can also use the more powerful XSLT stylesheet language to transform your XML into HTML—which browsers, of course, already know how to display (and that HTML can still use a CSS stylesheet). This way you get all the document management benefits of using XML, but you don't have to worry about your readers needing XML smarts in their browsers.
XSLT is an XML document processing language that uses source code that happens to be written in XML. An XSLT document declares a set of rules for an XSLT processor to use when interpreting the contents of an XML document. These rules tell the XSLT processor how to generate a new XML-like data structure and how that data should be emitted—as an XML document, as an HTML document, as plain text, or perhaps in some other format.
This transformation can be done either inside the browser, or by the server before the file is sent. Transformation in the browser offloads the processing from the server, but may introduce browser dependencies, leading to some of your readers being excluded. Transformation in the server makes the process browser-independent, but places a heavier processing load on the server.
As with any system where files can be viewed at random by arbitrary users, the author cannot know what resources (such as fonts) are on the user's system, so the same care is needed as with HTML using fonts. To invoke a stylesheet from an XML file for standalone processing in the browser, include one of the stylesheet declarations:
<?xml-stylesheet href="foo.xsl" type="text/xsl"?> <?xml-stylesheet href="foo.css" type="text/css"?>
(substituting the URI of your stylesheet, of course). See http://www.w3.org/TR/xml-stylesheet/ for the full details. The Cascading Stylesheet Specification (CSS) provides a simple syntax for assigning styles to elements, and has been implemented in most browsers.
Dave Pawson maintains a comprehensive XSL FAQ at http://www.dpawson.co.uk/xsl/
,
and his book XSL-FO: Making XML Look Good
in Print (Pawson, 2002) [the Fox book] is
available from O'Reilly. XSL uses XML syntax (an XSL
stylesheet is just an XML file) and has widespread support
from several major browser vendors (see the questions on
browsers
and other
software). XSL comes in two flavours:
XSL itself, which is a pure formatting language, outputting a Formatted Objects (FO) file, which needs a text formatter like FOP, XEP, or others to create printable (PDF) output (but see the tip ‘Alternatives to XSL:FO’ below, ‘How do I control formatting and appearance?’). Currently I am not aware of any Web browsers which support direct XSL rendering to PDF;
XSLT (T for Transformation), which is a language to specify transformations of XML into HTML either inside the browser or at the server before transmission. It can also specify transformations from one vocabulary of XML to another, and from XML to plaintext (which can be any format, including RTF and LATEX).
Currently only Microsoft Internet Explorer 5.5 and above, and Firefox 0.9.6 and above handle XSLT inside the browser (MSIE5.5 needs some post-installation surgery to remove the obsolete WD-xsl and replace it with the current XSL-Transform processor; MSIE6 and Firefox work as installed).
There have been attempts to produce pseudo-WYSIWYG
editors for creating XSL[T] stylesheets, but they have
mostly been restricted to simple mapping between input
elements and output elements (eg a DocBook
para to a HTML p).
Anything beyond this seems likely to fail because of the
infinite complexity of what people want to do with their
information. If you have access to the ACM database, see
the paper
by Pietriga, Vion-Dury, and Quint on VXT, from
the ACM DocEng'01 (Atlanta) Proceedings.
There is a growing use of server-side processors like Cocoon, AxKit, PropelX, and others, which let you create, store, and manage your information in XML but serve it auto-converted to HTML or some other format, thus allowing the output to be used by any browser. XSLT is also widely used to transform XML into non-SGML formats for input to other systems (for example to transform XML into LATEX for typesetting).
Instead of generating PDF via an FO processor, it is possible to use XSLT to transform XML to LATEX for typesetting PDF (as is done for the print versions of this FAQ, from DocBook to LATEX). This has the advantage of being able to make use of LATEX's extensive library of prewritten formatting modules (‘packages’), which avoids much of the wheel-reinventing currently required with XSL:FO.
Alternatively, David Carlisle's xmltex reads XML directly, offering another practical if experimental solution to typesetting XML. One use of a TEX system that can typeset XML files is as a backend processor for XSL:FO, serialized as XML. Sebastian Rahtz's PassiveTEX uses xmltex to achieve this end.
The TEX FAQ is at http://www.tex.ac.uk/faq
.
SGML systems used a similar stylesheet mechanism: some of the common ones were the FOSI (Formatted Output Specification Instance), which was standard in defence and industrial engineering applications, especially when using the Arbortext editor (Adept, now Epic); the DynaText/DynaWeb stylesheet used in SGML publishing to the web; and the Synex stylesheet used in browsers based on the Synex engine (eg Panorama, whose styling interface was partly adopted in XMetaL), the expertise of whose designers persists in the DocZilla browser.