Michael Fuchs has released
version 0.29 of DocBookDoclet, a Java application
for converting HTML files and Java source
documentation to DocBook XML. This release add
internationalization support.
The release is available for download in several
formats: RPM, tar/gz, tar/bz2, zip. A changelog is also available.
Along with supporting conversion of the most
commonly used Javadoc tags (@param, @throws, etc.),
DocBookDoclet supports conversion of most
structural/logical HTML markup (though, for some
reason, not span
or cite -- which
might be converted to, say, phrase and emphasis
remap="cite"). And it supports conversion of
some, but not all, presentational HTML markup; for
example, it currently ignores the big, small, and strike elements,
though it seems like these elements could all be
converted to phrase with a
corresponding value for the remap attribute.
Also, though it always seems to generate clean,
well-formed XML -- nicely indented even -- it does
sometimes produce DocBook instances that require
manual cleanup in order to be made valid (even if
the HTML source is valid). It seems for the most
part to do a one-to-one conversion of HTML elements
to DocBook elements, so markup instances that are
HTML-valid even though they lack certain HTML
elements (for example, a dl definition list
that lacks a dd
description element) can get converted to DocBook
instances that are invalid because they lack the
corresponding elements (for example, a "missing dd"
definition list gets converted to variablelist that
lacks a required listitem element).
Given that limitations in the structure that HTML
can be used to model, conversion of certain HTML
markup instances may continue to present a
challenge. Still, it would be interesting to see if
some logic could be added to DocBookDoclet to
detect and automatically correct certain validity
errors, so that they don't need to be corrected
manually.
Overall, though, even in its current (alpha)
incarnation, it's a very useful tool, and certainly
further along in terms of development than the only
other open-source alternative (Jeff Beal's Html2DocBook, which though currently more limited, is a potentially-very-appealing XSLT-only solution).