The XML FAQ: I have to do an overview of XML for my manager/client/investor/advisor. What should I mention?

Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 4: Developers

Q 4.18: I have to do an overview of XML for my manager/client/investor/advisor. What should I mention?

Non-proprietary multi-purpose flexible markup

Tad McClellan writes:

XML is not a markup language. XML is a ‘metalanguage’, that is, it's a language that lets you define your own markup languages (see definition).
XML is a markup language [two (seemingly) contradictory statements one after another is an attention-getting device that I'm fond of], not a programming language. XML is data: is does not ‘do’ anything, it has things done to it.
XML is non-proprietary: your data cannot be held hostage by someone else.
XML allows multi-purposing of your data.
Well-designed XML applications most often separate ‘content’ from ‘presentation’. You should describe what something is rather what something looks like (the exception being numerical or categorical data content which never gets presented to humans).

Saying ‘the data is in XML’ is a relatively useless statement, similar to saying ‘the book is in a natural language’. To be useful, the former needs to specify ‘we have used XML to define our own markup language’ (and say what it is), similar to specifying ‘the book is in French’.

A classic example of multipurposing and separation that I often use is a pharmaceutical company. They have a large base of data on a particular drug that they need to publish as:

reports to the FDA;
drug information for publishers of drug directories/catalogs;
‘prescribe me!’ brochures to send to doctors;
little pieces of paper to tuck into the boxes;
labels on the bottles;
two pages of fine print to follow their ad in Reader's Digest;
instructions to the patient that the local pharmacist prints out;
etc.

Without separation of content and presentation, they need to maintain essentially identical information in 20 places. If they miss a place, people die, lawyers get rich, and the drug company gets poor. With XML (or SGML), they maintain one set of carefully validated information, and write 20 programs [or one program with 20 outputs (Ed)] to extract and format it for each application. The same 20 programs can now be applied to all the hundreds of drugs that they sell.

In the Web development area, the biggest thing that XML offers is fixing what is wrong with HTML:

browsers allow non-compliant HTML to be presented;
HTML is restricted to a single set of markup (‘tagset’).

If you let broken HTML work (be presented), then there is no motivation to fix it. Web pages are therefore tag soup that are useless for further processing. XML specifies that processing must not continue if the XML is non-compliant, so you keep working at it until it complies. This is more work up front, but the result is not a dead-end.

If you wanted to mark up the names of things: people, places, companies, etc in HTML, you don't have many choices that allow you to distinguish among them. XML allows you to name things as what they are:

 
<person>Charles	Goldfarb</person> worked at <company>IBM</company>

gives you a flexibility that you don't have with HTML:

 
<B>Charles Goldfarb</B> worked at <B>IBM</B>

With XML you don't have to shoe-horn your data into markup that restricts your options.

Liam Quin writes:

Make sure your criteria are right
XML lives in the boundary between human knowledge and computer-accessible information, so to give computer behaviour priority over human understanding is often a mistake.