Scrapping DocBook
06:16, 3 Jun 2003 UTC | Michael Smith

Norm Walsh writes, "There comes a point in the life cycle of any system when adding one more patch is the wrong solution to every problem. Eventually, it's time to rethink, refactor, and rewrite. For DocBook, I think that time has come."

In two articles published recently on his personal website, Walsh gives his take on the current state of DocBook and ponders its future.

In the first article, he gives his recollections about how the DocBook SGML/XML vocabulary has developed over the 10+ years since DocBook 1.0 was released (10 November 1992), explicitly pointing out a few problems that have developed as a result (and seeming to imply a few others):

  • the purpose for which DocBook was originally designed -- primarily as an "exchange" or "interchange" vocabulary -- is at odds with what most people use it for today; most current users directly author documents in "standard" DocBook, as opposed to "writing in some private tag set, or deep customization of DocBook, and then converting to the standard to pass documents to other interchange partners") [note1]
  • the scale for which DocBook was originally designed -- roughly 100 elements -- is at odds with what it's grown into: currently, around 400 elements
  • as the scale of DocBook has grown over the years, the development process has simply been a sort of "growth by accretion", without as much regularity and consistency as could be hoped for -- resulting in a state that makes it difficult to add new features incrementally

In the second article, he outlines a few specific changes that could be made:

  • reduce the complexity and inconsistency of content models for inlines
  • replace all of the current multifarious metadata wrapper (*info) elements with a single wrapper: info
  • get rid of "cruft" (e.g., beginpage, contractsponsor, invpartnumber, msgset, segmentedlist) or replace/update it with something more up to date (sgmltag -> xmltag, link/ulink/olink -> ubiquitous linking)

He then ends the article with a link to a prototype schema (expressed in Relax NG) that instantiates some of the changes he has outlined in the articles.

In past mailing-list discussions of simplifying DocBook, some users have suggested fairly extensive refactorings -- along the lines of modularizing DocBook into a core set of general-use elements, with separate pluggable modules holding groups of elements for specific kinds of specialized content (e.g., a "math elements" module, a "computer hardware/software elements" module).

Those users won't find Walsh's prototype to be quite what they had in mind for a "refactored" DocBook. The prototype amounts to more of "streamlined" or "cleaned-up" version of the current element set; it still weighs in at 300+ elements, with most of the attribute values on those elements being the same as what they are currently.

In the article, Walsh also mentions that the TC has talked many times about the need to rework the DocBook parameter-entity structure in order to make it more manageable. But looking closely at the current prototype, it doesn't appear to be intended to be complete solution for that need; though it contains named patterns for classes of inlines, it has none yet for classes of container elements (divisions/components/blocks) -- and also not yet any definition-replacement hooks that would facilitate customization of the schema.

Note: I've added some new pages to the DocBook Wiki for discussion about current DocBook shortcomings and opinions and ideas about what possible future directions for DocBook.

[note1] The current purpose is especially apparent in the in the open-source software community, where DocBook has become a de facto document-authoring standard -- used by, among others, the KDE and GNOME, FreeBSD, Debian, and Linux documentation projects and the Darwin Documentation Project at Apple and by hundreds of individual users; a big move continues to take place, toward authoring in XML directly, in standard DocBook, and away from using TeXinfo, LaTeX, groff (for authoring at least, no necessarily for publishing) and a number of older and more limited markup vocabularies (Linuxdoc and others).

| See 1 comment

Newest comments

Re: Scrapping DocBook (Norman Walsh - 17:50, 5 Jun 2003)
I would not have described my ruminations under the rubric "scrapping DocBook", but maybe I'm overly ...
xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories