Your support for our advertisers helps cover the cost of hosting, research, and maintenance of this FAQ

The XML FAQ — Frequently-Asked Questions about the Extensible Markup Language

Section 3: Authors

Q 3.22: How can I handle embedded HTML in my XML?

Provide for it in the output, use a deep copy, or try disable-output-escaping.

Apart from using CDATA Sections, there are two common occasions when people want to handle embedded HTML inside an XML element:

  1. when they have received (possibly poorly-designed) XML or HTML from somewhere else which they must find a way to handle;

  2. when they have an application which has been explicitly designed to store a string of characters containing < and & character entity references with the objective of turning them back into markup in a later process (eg FreeMind, Atom).

Generally, you want to avoid this kind of trick, as it usually indicates that the document structure and design has been insufficiently thought out. However, there are occasions when it becomes unavoidable, so if you really need or want to use embedded HTML markup inside XML, and have it processable later as markup, there are a few techniques you may be able to use:

  • Provide templates for the handling of that markup in your XSLT transformation or whatever software you use which simply reproduces what was there untouched, eg if you have to preserve <b>some text</b> as-is, supply a template to do it:

    <xsl:template match="h:b">
      <b>
        <xsl:apply-templates/>
      </b>
    </xsl:template/>
    	      

    (If you are handling elements from several different DTDs or Schemas, you will probably need Namespaces to keep them distinct, hence the h: prefix.)

  • Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup verbatim, eg

    <xsl:template match="h:b">
      <xsl:copy-of select="."/>
    </xsl:template/>
    	      
  • As a last resort, use the disable-output-escaping attribute on the xsl:text element of XSL[T] which is available in some processors, eg

    <xsl:text disable-output-escaping="yes"><![CDATA[<b>Now!</b>]]&gt;</xsl:text>
    	      

    This falls into the ‘dirty tricks’ department, and is usually deprecated. Some processors do not support it.

  • Some processors (eg JX) are now providing their own equivalents for disabling output escaping. Their proponents claim it is ‘highly desirable’ or ‘what most people want’, but it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary code from being passed untouched through your system. It also adds another dependency to your software.

For more details of using these techniques in XSL[T], see the relevant question in the XSL FAQ.

Read When should I use a CDATA Marked Section? as well, which is very closely related.