xmlformat: Consistently format XML files
13:13, 4 Feb 2004 UTC | Michael Smith

xmlformat is a REX-based script (take your pick from Perl or Ruby versions) for consistently reformatting XML files; that is, "canonizing" and normalizing whitespace, indenting, line-wrapping, and placement of line breaks. It works as advertised, handling mixed content and "verbatim" content correctly.

xmlformat (developed by MySQL guru Paul DuBois[1]) is a tool to use when you want an off-the-shelf solution for "pretty printing" XML files for better readability--or, more importantly, when you want to ensure that before being committed to a revision-control system and/or having diffs run against your files, the whitespace, indenting, line-wrapping, and placement of line breaks in them have been put into a standard, consistent format.

A tool like xmlformat is especially useful in environments where you have multiple people working on the same set of XML files--people who may be using a variety of editing applications to edit the files. As DuBois puts in his intro to the xmlformat documentation:

XML editors typically impose their own style conventions on files. The application of different style conventions to successive document revisions can result in large version diffs where most of the bulk is related only to changes in format rather than content. This can be a problem if, for example, the version control system automatically sends the diffs to a committer's mailing list that people read. If documents are rewritten to a common format before they are committed, these diffs become smaller. They better reflect content changes and are easier for people to scan and understand.

The good news about xmlformat specifically (as opposed to some other XML "pretty printing" tools) is that it actually always seems to do what you'd expect it to do; after doing some initial configuration to teach it about your content--for example, to tell it which elements in your XML files are inline elements and which are block elements, which need to be handled as "verbatim" elements, and which you want it to whitespace-normalize--I think you'll find that it reformats your content the way you want it, without unexpectedly removing or introducing any whitespace (including handling mixed content correctly).[2]

Clear, well-written documentation on how to configure xmlformat is provided both in the xmlformat distribution and online. The documentation also includes details about how it works[3]. The xmlformat Perl and Ruby scripts themselves are also extensively commented and make for an interesting read.

[1] Author of O'Reilly's MySQL Cookbook and a number of other books on MySQL, csh/tcsh, imake.

[2] It works for me at least; in testing it with a number of files of 15,000+ lines, I never found a single instance of it adding whitespace where it shouldn't have been added or deleting whitespace where it should have been preserved, or wrapping or indenting anything I didn't ask it to.

[3] The parsing method that xmlformat uses is based on Robert D. Cameron's REX (regular-expression-based "shallow parsing") method.

| See all 3 comments

Newest comments

Re: xmlformat: Consistently format XML files (Arnout Engelen - 11:46, 28 Mar 2004)
xmllint -format seems to be a bit more limited, though. For example, xmlformat allows you to specify ...
Re: xmlformat: Consistently format XML files (Reinier Post - 10:03, 12 Feb 2004)
Very good point. I might add that formatting isn't just for humans. If you want to put your docume ...
Re: xmlformat: Consistently format XML files (user2048 - 13:16, 5 Feb 2004)
The perl version, at least, uses an ad hoc parser, rather than a standard, proven one. That's not ...
xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories