Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ
There is no single answer to this: a lot depends on what you are designing the document type for.
Traditional editorial practice for normal text documents is to put the real text (what would be printed) as character data content, and keep the metadata (information about the text) in attributes, from where they can more easily be isolated for analysis or special treatment like display in the margin or in a mouseover:
<l n="184"> <spara>Portia</spara> <text>The quality of mercy is not strain'd,</text> ... </l>
But from the systems point of view, there is nothing wrong with storing the data the other way round, especially where the volume of text data on each occasion is relatively small:
<line speaker="Portia" text="The quality of mercy is not strain'd,">184</line>
A lot will depend on what you want to do with the information and which bits of it are easiest accessed by each method. A rule of thumb for conventional text documents is that if the markup were all stripped away, the bare text should still be correct, readable, and usable, even if unformatted and inconvenient. For database output, however, or other machine-generated documents like e-commerce transactions, human reading may not be meaningful, so it is perfectly possible to have documents where all the data is in attributes, and the document contains no character data in content models at all. See http://xml.coverpages.org/elementsAndAttrs.html for more information.
Mike Kay writes:
From a user: ‘[…] do most of you out there use element-based or attribute-based xml? why? ’
Beginners always ask this question. Those with a little experience express their opinions passionately. Experts tell you there is no right answer. (http://lists.xml.org/archives/xml-dev/200006/msg00293.html)