James Clark unveils a new XML mode for GNU Emacs
11:41, 10 Sep 2003 UTC | Michael Smith

More magic from James Clark: He's announced the alpha release of nXML, a new mode for editing XML documents from within GNU Emacs. It's a milestone in that it's the first open-source editing application to enable context-sensitive validated editing against Relax NG schemas. It also provides a clever mechanism for real-time, automatic visual identification of validity errors, along with flexible syntax-highlighting capabilities -- and many other features planned for future releases.

To get the current release, go to Clark's Thai Open Source download site, and look for the latest nxml-mode-200nnnnn.tar.gz distribution. To get started using it, follow the installation instructions in the README file in the distribution, and see the TUTORIAL file for instructions on using its context-sensitive completion feature, as well as details about customizing its file-name-based and root-element-based schema auto-assignment mechanism. Once you've got it up and running, type M-x describe-mode or C-h m for more information.

You will find that despite its "alpha" status, nXML is quite stable and usable for real-world editing tasks already. But if you do end up needing help, or find a bug, or want to make a feature suggestion, there's an emacs-nxml-mode mailing list that Clark has set up for nXML discussion and support.

But while Emacs/PSGML is limited to doing its markup-checking strictly against DTDs, Emacs/nXML does its against Relax NG schemas (specified in the Relax NG compact syntax -- and included in the nXML mode distribution are Relax NG compact-syntax schemas for DocBook, XHTML, XSLT, RDF/XML, and for Relax NG itself).

By enabling this kind of context-sensitive Relax NG-aware editing of XML documents, and making it possible to put together a completely DTD-free, open-source XML toolchain (that is, Emacs/nXML, used in combination with Relax NG-aware processing applications such as Daniel Veillard's xmllint and xsltproc, which are provided in the libxml2 and libxslt distributions), nXML eliminates what may have been the last remaining reason many users have had for keeping DTDs around.

Because going DTD-less means that you can also go without DOCTYPE declarations in your documents, and because the Relax NG specification does not mandate any way for associating a document with a Relax NG, some mechanism needs to be provided at the editing application level; Emacs/nXML provides two mechanisms: one for manually specifying a Relax NG schema by browsing for it on your local filesystem, and one customizable mechanism for automatically associating a document with a schema.

The schema auto-association mechanism works by looking at the filename extension of the document (it's configured by default to do it for .html, .xsl, .rdf, and .rnc files), or failing that, by looking at the document's root element (for example, it's configured by default to associate the DocBook schema with documents that have book or article root elements, the XSLT schema with documents that have stylesheet or transform root elements, etc.)

Another powerful feature that Emacs/nXML provides is a completely automated mechanism for visually identifying validity errors in a document, in real-time -- one that doesn't require you to take any manual action to initiate validity checking.

The feature is similar to a feature in the Topologi Collaborative Markup Editor (a relatively new commercial application that takes a number of novel approaches to XML editing). The Emacs/nXML implementation of the feature works like this: As you are editing a document, nXML:

  • does background re-parsing and re-validating of the document in the idle periods between the times when you are actually typing in content

  • visually highlights all instances of invalidity it finds in the document (by default, the value of the Emacs "face" it uses to highlight invalidity instances is a red underline -- but the highlighting can be changed by customizing that face)

If you then mouse over one of the invalidity-highlighted points in the document, popup text appears describing the validity error (see Figure 2). Or, if you move the text cursor to the location of the invalidity highlighting, the description of the validity error instead appears in the "minibuffer" echo area at the bottom of the Emacs interface (see Figure 3). You can also use a keyboard combination (C-c C-n) to step through all validity errors in the document.

Clark once described Relax NG as "a conservative, evolutionary refinement of well-proven ideas from SGML and XML DTDs", Emacs/nXML, even in this "alpha" stage, may be seen in part as an evolutionary refinement in XML editing -- with some features (context-sensitive completion) very similar to capabilities in existing editors such as Emacs/PSGML, some features (configurable syntax highlighting) that are incremental improvements over existing capabilities, and at least one feature (automatic real-time highlighting of validity errors) that is a sort of next-generation step beyond capabilities in most current editors.

That said, as usable as it may be in its current state, Clark seems to be considering it just a start, with significant new development planned; the description for the emacs-nxml-mode mailing list says, in part, that its purpose is to "discuss details of what features the mode should provide and how they should work". And in his initial release announcement, Clark writes:

This is still very much a work in progress. Most of the work has been on providing the underlying infrastructure to support incremental parsing and validation. There's still much to be done in exploiting this infrastructure in support of XML editing. I hope early users will help figure out the best way to do this.

Also, the TODO file in the distribution contains a long, long, list of potential changes. Here's a sample of some of the more intriguing ones:

  • Command to insert an element template, including all required attributes and child elements

  • Use RDDL to locate a schema based on the namespace URI

  • Structure view + Collapse and expand elements (using invisible, intangible and display text properties) [this seems like it might be something like the folded-edited support that PSGML provides]

  • Smart selection command that selects increasingly large syntactically coherent chunks of XML. If point is in an attribute value, first select complete value; then if command is repeated, select value plus delimiters, then select attribute name as well, then complete start-tag, then complete element, then enclosing element, etc.

An idea that I'd personally like to see implemented: Add a mechanism for specifying lists of elements from a particular schema to 'ignore' -- that is, elements that, though they are in the schema, the user wants to omit from context-sensitive completion lists for element names.

The rationale behind that is that many users have problems working with large schemas (like DocBook) because those schemas contain many elements that they have no use for at all and would just like to ignore completely. But it's a challenging and time-consuming process to create a schema customization to remove unwanted elements. It's much more efficient and easily user-customizable to provide an 'ignore' capability at the editing application layer level.

And XInclude support would be a really nice thing to have as well.

Related links

Re: James Clark unveils a new XML mode for GNU Emacs (James Clark - 02:22, 11 Sep 2003)

I think this is potentially useful for W3C XML Schema users as well.

Sun's MSV validator (now released as open source) supports W3C XML Schema by translating it internally into RELAX NG, and this functionality is available in the rngconv utility available from . The output from this can then be translated to compact syntax using trang. I can't say I've tried this yet, though.

In the longer term, nxml-mode could support W3C XML Schema on an equal basis to RELAX NG by adding an Emacs Lisp module that does the same trick as MSV and parses W3C XML Schema into nxml-mode's internal representation, which is basically just a Lisp representation of RELAX NG's simplified syntax. This is an order of magnitude less work than the whole of nxml-mode. Any volunteers?

xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories