Copyright © 2010 Silmaril Consultants
Rev: 2010-03-01T00:19:00+0000

Non-proprietary multi-purpose flexible markupD.12  I have to do an overview of XML for my manager/client/investor/advisor. What should I mention?

Tad McClellan writes:

  • XML is not a markup language. XML is a ‘metalanguage’, that is, it's a language that lets you define your own markup languages (see definition).

  • XML is a markup language [two (seemingly) contradictory statements one after another is an attention-getting device that I'm fond of], not a programming language. XML is data: is does not ‘do’ anything, it has things done to it.

  • XML is non-proprietary: your data cannot be held hostage by someone else.

  • XML allows multi-purposing of your data.

  • Well-designed XML applications most often separate ‘content’ from ‘presentation’. You should describe what something is rather what something looks like (the exception being data content which never gets presented to humans).

Saying ‘the data is in XML’ is a relatively useless statement, similar to saying ‘the book is in a natural language’. To be useful, the former needs to specify ‘we have used XML to define our own markup language’ (and say what it is), similar to specifying ‘the book is in French’.

A classic example of multipurposing and separation that I often use is a pharmaceutical company. They have a large base of data on a particular drug that they need to publish as:

Without separation of content and presentation, they need to maintain essentially identical information in 20 places. If they miss a place, people die, lawyers get rich, and the drug company gets poor. With XML (or SGML), they maintain one set of carefully validated information, and write 20 programs to extract and format it for each application. The same 20 programs can now be applied to all the hundreds of drugs that they sell.

In the Web development area, the biggest thing that XML offers is fixing what is wrong with HTML:

If you let broken HTML work (be presented), then there is no motivation to fix it. Web pages are therefore tag soup that are useless for further processing. XML specifies that processing must not continue if the XML is non-compliant, so you keep working at it until it complies. This is more work up front, but the result is not a dead-end.

If you wanted to mark up the names of things: people, places, companies, etc in HTML, you don't have many choices that allow you to distinguish among them. XML allows you to name things as what they are:

 
<person>Charles	Goldfarb</person> worked at <company>IBM</company>
		

gives you a flexibility that you don't have with HTML:

 
<B>Charles Goldfarb</B> worked at <B>IBM</B> 

With XML you don't have to shoe-horn your data into markup that restricts your options.