xmlhack: A new encoding to replace entities?

Announced on XML-DEV, this Internet Draft proposes the definition of a new encoding called "UTF-8+names," to perform the expansion of HTML and MathML entities.

Even though this wasn't explained very clearly, the initial motivation seems to be for all those who would like to use in XML -- as is possible in HTML -- entities such as "é" or " " without having to declare them in a DTD.

The principle is simple: when this encoding would be used, HTML and MathML entity references would be replaced by their values before being seen by the XML parser. Entity references which do not belong to the list of HTML and MathML entities would be processed by the parser as usual.

Like any attempt to modify the core mechanisms of XML, this proposal has raised some controversy and eighty messages have been exchanged on this subject over the weekend.

Though some seem to love the idea, like Mike Champion who qualifies the proposal of "great idea", other clearly hate it, like Simon St.Laurent who sees a proposal "fundamentally corrupt, effectively cheating on the separation between character encodings and the representation of characters in those encodings. "

As a matter of fact, UTF-8+names introduces a new layer between "character decoding" and XML parsers. For Bill de hÓra, this new layer is similar to macro-processing and several people strongly suggest that a syntax different from XML entity references be used to indicate that this new feature belongs to another layer than entity references.

Finally, for James Clark, this is a solution -- "extremely confusing to users" -- to a non-issue: XHTML users should rather use tools which correctly support existing encodings, and MathML symbols would be better represented by XML elements than by entity references.