xmlhack: Why XML 1.1?

I'd better say it right now, though XML 1.1 and namespaces in XML 1.1 do not include that many changes compared to XML 1.0 and namespaces in XML 1.0, these changes are enough to break the compatibility: a well formed XML 1.1 document isn't necessarily a well formed XML 1.0 document.

These changes, so small but yet so disruptive, were almost unavoidable and have been awaited for more than three and a half years.

The Unicode standard on which is XML 1.0 is built has been evolving. XML 1.0 was specified based on Unicode 2.0, while the Unicode consortium has now published its 4.0 release, with several thousands of new characters. It was time to take these updates into account in the XML recommendation.

XML 1.0 had taken some precautions to avoid having to be updated for each new edition of Unicode. XML 1.0 says:

Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions.

This would have saved us from XML 1.1 if XML 1.0 hadn't specified several of its own character classes as explicit lists of characters.

This is the case of characters that can be used as new lines, where the "NEL" character widely used on IBM mainframes had been forgotten. This is also the case for the class of characters which are valid in names. These lists being explicitly specified have not followed the evolution of Unicode, and none of the thousands of new characters can be used in element or attribute names.

XML 1.1 has learn the lesson from this "over-specification" and has significantly softened its policy regarding characters that can be used in names: anything which is not explicitly forbidden is now allowed in XML names, and new characters will automatically be accepted when they are added to the Unicode standard.

Namespaces in XML 1.1 follow the new rules set by XML 1.1 and add support of "internationalised" URIs, the so-called IRIs which are not fully specified yet.

Fair enough, but what will the practical consequences of these two publications be?

For people in charge of open systems that receive and emit XML documents, the usual rule of the thumb is to be liberal in what they accept and conservative in what they emit:

It's wise to install new versions of XML tools (including parsers) that support XML 1.1 as soon as they are available, to be ready to support incoming documents coded as XML 1.1.
On the other side, it is wise to wait as long as possible before sending XML 1.1 documents since we don't know how long that will take before all the receiving partners will be ready to accept XML 1.1 documents.

The only exception is for applications which would really require XML 1.1, but what are the use cases?

An application may require XML 1.1 because it must accept one of the new characters (such as for instance an ancient Cypriot character) in an element or attribute name. One should note that the new characters are already accepted in the content of a document per XML 1.0 and that XML 1.1 is required only if they must be used as names.
An application may require XML 1.1 to use a new line feed character, for instance "NEL" because it is feeding XML documents from mainframe data without conversion.
Or an application may require Namespaces in XML 1.1 because it must use a IRI to identify a namespace.

That's it.

There aren't that many use cases that justify sending XML 1.1 documents, and that's the last paradox of these two recommendations: they don't change that many things, they were unavoidable, they are disruptive and they may well stay ignored and only marginally used.