I'd better say it right now, though XML 1.1 and namespaces in XML 1.1 do
not include that many changes compared to XML 1.0 and namespaces in XML
1.0, these changes are enough to break the compatibility: a well formed
XML 1.1 document isn't necessarily a well formed XML 1.0 document.
These changes, so small but yet so disruptive, were almost unavoidable
and have been awaited for more than three and a half years.
The Unicode standard on which is XML 1.0 is built has been evolving.
XML 1.0 was specified based on Unicode 2.0, while the Unicode
consortium has now published its 4.0 release, with several thousands of new
characters. It was time to take these updates into account in the XML
recommendation.
XML 1.0 had taken some precautions to avoid having to be updated for
each new edition of Unicode. XML 1.0 says:
Legal characters are tab, carriage return, line feed, and the legal
characters of Unicode and ISO/IEC 10646. The versions of these standards
cited in A.1 Normative References were current at the time this document
was prepared. New characters may be added to these standards by
amendments or new editions.
This would have saved us from XML 1.1 if XML 1.0 hadn't specified
several of its own character classes as explicit lists of characters.
This is the case of characters that can be used as new lines, where the
"NEL" character widely used on IBM mainframes had been forgotten. This is
also the case for the class of characters which are valid in names. These
lists being explicitly specified have not followed the evolution
of Unicode, and none of the thousands of new characters can be used in
element or attribute names.
XML 1.1 has learn the lesson from this "over-specification" and has
significantly softened its policy regarding characters that can be used in
names: anything which is not explicitly forbidden is now allowed in XML
names, and new characters will automatically be accepted when they are added to the Unicode standard.
Namespaces in XML 1.1 follow the new rules set by XML 1.1 and add
support of "internationalised" URIs, the so-called IRIs which are not
fully specified yet.
Fair enough, but what will the practical consequences of these two
publications be?
For people in charge of open systems that receive and emit XML
documents, the usual rule of the thumb is to be liberal in what they
accept and conservative in what they emit:
- It's wise to install new versions of XML tools (including parsers)
that support XML 1.1 as soon as they are available,
to be ready to support
incoming documents coded as XML 1.1.
- On the other side, it is wise to wait as long as
possible before sending XML 1.1 documents since we don't know how long
that will take before all the receiving partners will be ready to accept
XML 1.1 documents.
The only exception is for applications which would really require XML
1.1, but what are the use cases?
- An application may require XML 1.1 because it must accept one of the
new characters (such as for instance an ancient Cypriot character) in an
element or attribute name. One should note that the new characters are
already accepted in the content of a document per XML 1.0 and that XML
1.1 is required only if they must be used as names.
- An application may require XML 1.1 to use a new line feed character,
for instance "NEL" because it is feeding XML documents from mainframe
data without conversion.
- Or an application may require Namespaces in XML 1.1 because it must
use a IRI to identify a namespace.
That's it.
There aren't that many use cases that justify sending XML 1.1 documents,
and that's the last paradox of these two recommendations: they don't
change that many things, they were unavoidable, they are disruptive and
they may well stay ignored and only marginally used.
Other stories:
|