22. What are the special characters in XML?
For normal text (not markup), there are no special characters except < and &: just make sure your XML Declaration refers to the correct encoding scheme for the language and/or writing system you want to use, and that your computer correctly stores the file using that encoding scheme. See the question on non-Latin characters for a longer explanation.
Apart from the invisible ASCII control characters (the ones you can't type), all other characters are just normal text. Currency signs (€, £, $, ƒ, and others), all the punctuation (except < and &), and all other letters, signs, and symbols in any language or writing system are just text.
If your keyboard will not allow you to type the characters you want, or if you want to use characters outside the limits of the encoding scheme you have chosen, you can use a symbolic notation called ‘entity referencing’. Entity references can either be numeric, using the decimal or hexadecimal Unicode code point for the character (eg if your keyboard has no Euro symbol (€) you can type €); or they can be character, using an established set of names which you can declare in your DTD (eg <!ENTITY euro "€">) which then lets you use the name € in your document. If you are using a Schema, you must use the numeric form for all except the five below because Schemas have no way to make character entity declarations.
If you use XML with no DTD, then these five character entities are assumed to be predeclared, and you can use them without declaring them:
The less-than character (<) starts element markup (the first character of a start-tag or an end-tag).
The ampersand character (&) starts entity markup (the first character of a character entity reference).
The greater-than character (>) ends a start-tag or an end-tag.
The double-quote character (") can be symbolised with this character entity reference when you need to embed a double-quote inside a string which is already double-quoted.
The apostrophe or single-quote character (') can be symbolised with this character entity reference when you need to embed a single-quote or apostrophe inside a string which is already single-quoted.
If you are using a DTD then you must declare all other character entities you need to use, so it would be good practice also to declare any of the five above that you plan on using. If you are using a Schema, you must use the numeric form for all except the five above because Schemas have no way to make character entity declarations.
There are also no reserved words as such in the user namespace of XML: you can call an element element and an attribute attribute and so on as in the following (perverse) example:
<?xml version="1.0"?> <!DOCTYPE DOCTYPE SYSTEM "SYSTEM" [ <!ELEMENT DOCTYPE (ELEMENT+)> <!ATTLIST ELEMENT ATTLIST ENTITY #IMPLIED> <!NOTATION DOCTYPE SYSTEM "ENTITY"> <!ENTITY NOTATION SYSTEM "ENTITY" NDATA DOCTYPE> ]> <DOCTYPE> <ELEMENT ATTLIST="NOTATION">foo</ELEMENT> </DOCTYPE>
where the file SYSTEM contains the declaration: <!ELEMENT ELEMENT (#PCDATA)> and the file ENTITY does not even exist.
There are keywords like DOCTYPE and IMPLIED which are reserved Names, but they are prefixed by a flag character (the Markup Declaration Open character or the Reserved Name Indicator) so that they cannot be confused with user-specified Names.