Encodings
A new encoding to replace entities?
12:42, 20 Oct 2003 UTC | Eric van der Vlist

Announced on XML-DEV, this Internet Draft proposes the definition of a new encoding called "UTF-8+names," to perform the expansion of HTML and MathML entities.

Even though this wasn't explained very clearly, the initial motivation seems to be for all those who would like to use in XML -- as is possible in HTML -- entities such as "é" or " " without having to declare them in a DTD.

The principle is simple: when this encoding would be used, HTML and MathML entity references would be replaced by their values before being seen by the XML parser. Entity references which do not belong to the list of HTML and MathML entities would be processed by the parser as usual.

Like any attempt to modify the core mechanisms of XML, this proposal has raised some controversy and eighty messages have been exchanged on this subject over the weekend.

Though some seem to love the idea, like Mike Champion who qualifies the proposal of "great idea", other clearly hate it, like Simon St.Laurent who sees a proposal "fundamentally corrupt, effectively cheating on the separation between character encodings and the representation of characters in those encodings. "

As a matter of fact, UTF-8+names introduces a new layer between "character decoding" and XML parsers. For Bill de hÓra, this new layer is similar to macro-processing and several people strongly suggest that a syntax different from XML entity references be used to indicate that this new feature belongs to another layer than entity references.

Finally, for James Clark, this is a solution -- "extremely confusing to users" -- to a non-issue: XHTML users should rather use tools which correctly support existing encodings, and MathML symbols would be better represented by XML elements than by entity references.

Related stories:

Ummm...... (Guy - 23:38, 16 Dec 2003)

That's silly. You can already do any character encoding in xml, you just have to declare the... character encoding.

Re: A new encoding to replace entities? (Leigh Klotz - 22:53, 22 Oct 2003)

Was it a subtle joke, or did you mean entities such as "é" or "&"

Re: A new encoding to replace entities? (Robin La Fontaine - 08:28, 21 Oct 2003)

I am very much with James Clark on this one. His simple suggestion of using XML elements also solves another big problem with entities: they disappear when you process the file, e.g. the output from XSLT does not preserve them! Often they need to be preserved through processing. Using XML elements solves this one also.

  
xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories
Encodings
Community