Announced
on XML-DEV, this Internet Draft proposes the definition of a new encoding called
"UTF-8+names," to perform the expansion of HTML and MathML entities.
Even though this wasn't explained very clearly, the initial motivation
seems to be for all those who would like to use in XML -- as is possible
in HTML -- entities such as "é" or " " without having to
declare them in a DTD.
The principle is simple: when this encoding would be used, HTML and MathML
entity references would be replaced by their values before being seen by the
XML parser. Entity references which do not belong to the list of HTML and
MathML entities would be processed by the parser as usual.
Like any attempt to modify the core mechanisms of XML, this proposal has
raised some controversy and eighty messages have been exchanged on this subject
over the weekend.
Though some seem to love the idea, like Mike Champion who qualifies
the proposal of "great idea", other clearly hate it, like Simon St.Laurent
who sees a
proposal "fundamentally corrupt, effectively cheating on the separation
between character encodings and the representation of characters in those
encodings. "
As a matter of fact, UTF-8+names introduces a new layer between "character
decoding" and XML parsers. For Bill de hÓra, this new layer is similar to
macro-processing and several people strongly suggest that a syntax
different from XML entity references be used to indicate that this new
feature belongs to another layer than entity references.
Finally, for James
Clark, this is a solution -- "extremely confusing to users" -- to a non-issue: XHTML users should rather use tools which correctly support existing
encodings, and MathML symbols would be better represented by XML elements than
by entity references.
Related stories:
|