Schemas
Expressing mixed content in RELAX NG versus WXS
15:18, 21 Jan 2003 UTC | Uche Ogbuji

This article, based on analysis for the OASIS Open Office XML Format TC, compares the expressivity of RELAX NG (RNG) and W3C XML Schema (WXS) for mixed content, based on secondary sources.

Among the ZVON tutorials is a side-by-side comparison of WXS and RNG. This page compares mixed content in the form of text and only one element.

In brief:

Valid document:

<AAA>   xxx yyy
  <BBB>ZZZ</BBB> aaa
</AAA> 

WXS:

<xsd:element name="AAA">
  <xsd:complexType mixed="true">
    <xsd:sequence minOccurs="1">
      <xsd:element name="BBB" type="xsd:string"/>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element> 

RNG:

<element name="AAA">
  <mixed>
    <element name="BBB">
      <text/>
    </element>
  </mixed>
</element> 

Actually, the above is a valid shortened form of the actual ZVON example, which is:

<element name="AAA">
  <interleave>
  <text/>
    <element name="BBB">
      <text/>
    </element>
  </interleave>
</element> 

So far, so similar. The main difference is that, in Eric's words, "[in WXS] you can't have ANY constraint on the text nodes in a mixed content model."

For more information on this, see Eric's Relax NG, Compared, which cross-references to Using W3C XML Schema. The key section is Content Types, from which:

To define mixed content models you declare both embedded text and elements:

<element name="book">
 <attribute name="isbn">
  <text/>
 </attribute>
 <interleave>
  <element name="title">
   <text/>
  </element>
  <element name="author">
   <text/>
  </element>
  <zeroOrMore>
   <text/>
  </zeroOrMore>
 </interleave>
</element>

Since text nodes are handled like elements and attributes, their individual location and types can be defined and constrained, something which isn't possible with W3C XML Schema. Suppose we have an element p containing lines terminated by empty br elements, and that we want to disallow blank lines. We can write

<element name="p">
 <zeroOrMore>
  <text/>
  <element name="br">
 </zeroOrMore>
 <optional>
  <text/>
 </optional>
</element>

Eric pointed out to me that "their individual location and types can be defined and constrained" should be amended to something like "their individual location and values can be defined and constrained".

As another example of the added expressivity RELAX NG mixed content models allow, see Modularization of XHTML in RELAX NG where James Clark points out that RELAX NG permits even more expressivity than DTD:

The object and applet modules takes advantage of RELAX NG's absence of restrictions on mixed content to enforce the requirement that params precede other content. The forms module takes advantage of RELAX NG's absence of restrictions on mixed content to enforce the requirement that any legend precedes other content.

These features are also not available to WXS.

But RNG does not give unfettered control over all aspects of mixed content. It does restrict sequences of content which can match a data type and child element or tex in the same span. See section 7.2 String sequences for details. Eric van der Vlist has posted objections to this restriction. See discussion throughout this thread. Note that this is a lighter restriction than WXS's, which doesn't allow such patterns anyway because it doesn't allow any constraints involving text nodes in mixed content.

Eric discusses this RNG restriction more in Chapter 7 of his work-in-progress book, RELAX NG, available on-line. See the heading "Data versus text". For background on content models in RNG in general, including mixed content, see Chapter 6.

Some other useful WXS/RNG comparison links:

Related stories:

  
xmlhack: developer news from the XML community

Front page | Search | Find XML jobs

Related categories
Schemas