Norm Walsh writes, "There comes a point in the life cycle of
any system when adding one more patch is the wrong solution to
every problem. Eventually, it's time to rethink, refactor, and
rewrite. For DocBook, I think that time has come."
In two articles published recently on his personal website,
Walsh gives his take on the current state of DocBook and ponders
its future.
In the first article, he gives his recollections about how
the DocBook SGML/XML vocabulary has developed over the 10+ years
since DocBook 1.0 was released (10 November 1992),
explicitly pointing out a few problems that have developed as a
result (and seeming to imply a few
others):
- the purpose for which DocBook was originally
designed -- primarily as an "exchange" or "interchange"
vocabulary -- is at odds with what most people use it for today;
most current users directly author documents in "standard"
DocBook, as opposed to "writing in some private tag set, or deep
customization of DocBook, and then converting to the standard to
pass documents to other interchange partners") [note1]
- the scale for which DocBook was originally
designed -- roughly 100 elements -- is at odds with
what it's grown into: currently, around 400 elements
- as the scale of DocBook has grown over the years, the development
process has simply been a sort of "growth
by accretion", without as much regularity and consistency
as could be hoped for -- resulting in a state that makes it
difficult to add new features incrementally
In the second article, he outlines a few specific changes that
could be made:
- reduce the complexity and inconsistency of content models for
inlines
- replace all of the current multifarious
metadata wrapper (*
info
) elements with a
single wrapper: info
- get rid of "cruft" (e.g., beginpage,
contractsponsor, invpartnumber, msgset, segmentedlist) or
replace/update it with something more up to date (sgmltag ->
xmltag, link/ulink/olink -> ubiquitous linking)
He then ends the article with a link to a prototype schema (expressed in Relax NG)
that instantiates
some of the changes he has outlined in the articles.
In past mailing-list discussions of simplifying DocBook, some
users have suggested fairly extensive refactorings -- along the
lines of modularizing DocBook into a core set of general-use
elements, with separate pluggable modules holding groups of
elements for specific kinds of specialized content (e.g., a "math
elements" module, a "computer hardware/software elements"
module).
Those users won't find Walsh's prototype to be quite what they
had in mind for a "refactored" DocBook. The prototype amounts to
more of "streamlined" or "cleaned-up" version of the current
element set; it still weighs in at 300+ elements,
with most of the attribute values on those elements being the same
as what they are currently.
In the article, Walsh also mentions that the TC has talked many
times about the need to rework the DocBook
parameter-entity structure in order to make it more
manageable. But looking closely at the current prototype, it
doesn't appear to be intended to be complete solution for that
need; though it contains named patterns for classes of inlines, it
has none yet for classes of container elements
(divisions/components/blocks) -- and also not yet any
definition-replacement hooks that would facilitate customization
of the schema.
Note: I've added some new pages to the DocBook Wiki for
discussion about current DocBook shortcomings and opinions and ideas about
what possible
future directions for DocBook.
[note1]
The current purpose is especially apparent in the in the
open-source software community, where DocBook has become a de facto
document-authoring standard -- used by, among others, the KDE and GNOME, FreeBSD,
Debian, and Linux
documentation projects and the Darwin
Documentation Project at Apple and by hundreds of
individual users; a big move continues to take place, toward
authoring in XML directly, in standard DocBook, and away from
using TeXinfo, LaTeX, groff (for authoring at least, no
necessarily for publishing) and a number of older and more
limited markup vocabularies (Linuxdoc and others).
I would not have described my ruminations under the rubric "scrapping DocBook", but maybe I'm overly sensitive. I want to keep using DocBook, and I want most of my legacy DocBook documents to still be valid (or mechanically translatable to valid documents).
Refactoring would, I think, (or will, I hope), make DocBook easier to understand and maintain.