XML
Ever more whitespace issues
23:39, 14 Mar 2001 UTC | Simon St.Laurent

The W3C has published a note, The [NEL] Newline Character, exploring a Unicode 3.0 whitespace character that wasn't addressed by XML 1.0.

The problems arise on OS/390 systems, or on systems interacting with those mainframes:

"The omission of [NEL], the newline character defined in Unicode 3.0, from the End-of-Line Handling section in the XML 1.0 specification causes significant difficulty when processing XML documents and DTDs in IBM mainframe systems.... XML documents that contain [NEL] characters are declared invalid or not well-formed by XML 1.0 compliant parsers."

In Version 2.0 of Unicode, which XML was built upon, the characters from x0080 to x009F were undefined (though marked CTRL in the tables). The NEL character at x0085 first appeared in a 1999 Unicode Technical Report, and appeared in Unicode 3.0.

The Note provides suggested update text for XML 1.0, Second Edition. It also acknowleges the need for further work, in that "both the Unicode Consortium and the W3C Internationalization Working Group recommend the inclusion of the line separator (#x2028) and paragraph separator (#x2029) as well as [NEL]."

| See 1 comment

Newest comments

Re: Ever more whitespace issues (Wayne Steele - 20:01, 15 Mar 2001)
It has always concerned me that the XML definition of whitespace is different from the Unicode defin ...
  
xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories
XML
Encodings