XML Schema Clinic -- Enforcing Association Cardinality

XML Schema Clinic — Enforcing Association Cardinality

Will Provost

Originally published at XML.com, June 26, 2002

If you're like me, XML document design brings out your darker side. We like a schema whose heart is black — do we not? — a schema that's just itching to call a candidate document on the carpet for the slightest nonconformity. None of this any-element-will-do namespace validation for us. We enjoy the dirty work: schema, we think, are best built to be aggressive in ferreting out the little mistakes in the information set that could later confuse the more sheltered and constructive-minded XML application.

We'll practice a bit of that merciless science right here: specifically, we'll look at ways to control the cardinality of associations between XML elements. The basic implementation of an association between two types is simple enough — and we'll review it in a moment — but is only sufficient for many-to-one relationships. What if multiple references, or zero references, to an item are unacceptable? What if a referenced item may or may not be present? These variations will require other techniques, and these are essential for the truly Draconian schema author.

This article will use a simple UML notation to illustrate patterns and examples. Knowledge of both XML Schema 1.0 (Part 1 in particular) and UML is assumed, although in developing our notation we'll have a chance to review a little of both.

A Simple UML Notation

The Unified Modeling Language, or UML, provides a basis for a simple notation that will serve our needs in identifying rudimentary design patterns and in illustrating specific examples. (Note that many UML-to-Schema mappings are possible; see the "XMI Production for XML Schema" specification from the OMG for one much more formal option.)

First, let's denote a single complex type as what UML calls a class, with XML attributes and simple-type child elements represented by UML attributes — we're also borrowing the XPath @ shorthand to distinguish an XML attribute from a simple-type child element.

<xs:complexType name="Part">
  <xs:sequence>
    <xs:element name="Name" />
    <xs:element name="Price" />
  </xs:sequence>
  <xs:attribute name="PartID" />
  <xs:attribute name="InStock" />
</xs:complexType>

Expressing Compositions: Content Models

The above example also represents the most basic sort of composition, in which all the components are of simple type. Composition of one complex type in terms of another in UML is most literally denoted as an aggregation by value — note the use of a name, as for an attribute, on the composing type, and the depiction of cardinality:

<xs:complexType name="Dealership">
  <xs:sequence>
    <xs:element name="InventoryItem" type="Part" 
      minOccurs="0" maxOccurs="unbounded" />
    <!-- Other content omitted. -->
  </xs:sequence>
</xs:complexType>

One brief note on composition: it is so basic to the XML information model that it is easy to overuse. It's an excellent habit to challenge the assumption that composition is the best characterization of a given relationship, and with XML Schema, as we're about to see, there's no practical reason to avoid implementing a looser association where it's appropriate to the design.

Expressing Associations: Key Relationships

For types that are related, but not by composition, the UML association is a useful concept. It declares that between two types there is a clear relationship, which may be named on either side, or on both, and usually that there is a clear understanding of cardinality as well. The association also may be navigable in either or both directions — to navigate the association is to find an associated instance based on knowledge of an associating instance.

Although the association is an OO concept, its implementation in XML Schema relies on a classic relational database feature, the foreign-key relationship. A key must be defined to allow unique instances of the associated type (Salesman above) to be identified; and the instances of the associating type (Car) rely on a key reference to point to referenced items. These two constraints are very close in spirit to RDB primary and foreign keys, respectively.

<xs:element name="Dealership">
  <xs:complexType>
    <xs:sequence>
      <!-- Referenced types Car and Salesman not shown. -->
      <xs:element ref="Car" minOccurs="0" maxOccurs="unbounded" />
      <xs:element ref="Salesman" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
  </xs:complexType>
  <xs:key name="SalesmanKey">
    <xs:selector xpath="Salesman" />
    <xs:field xpath="@SellerID" />
  </xs:key>
  <xs:keyref name="CarToSalesman" refer="SalesmanKey">
    <xs:selector xpath="Car" />
    <xs:field xpath="@SoldBy" />
  </xs:keyref>
</xs:element>

The precise validation rules that are realized by these two components are as follows:

Each element of type Salesman must be found to be unique among its direct siblings (not in the whole document), as evaluated by the fields listed in the governing key.
Each element of type Car that has the fields identified in the governing key reference (it might not, depending on composition cardinality!), the values in those fields must be found to be equal to exactly one corresponding set of values in the referenced key.

The basic implementation of an association, shown above, expresses many-to-one cardinality: multiple A to singular B. Let's speak generally now of associations "from A to B". By nature, the xs:keyref simply assures that a B exists wherever an A refers to it; it guarantees that there are no dangling references, so to speak. Validation of a document against a key reference does not make any checks as to the number of elements that might reference the same item. The following sections lay out techniques for deviating from the default association cardinality on each side of the relationship.

Controlling Association Cardinality: Optional or Multiple Associated Type

The cardinality of the associated type is the easier of the two: this is really a matter of controlling the composition of referencing fields. Think of a referencing field (such as @SoldBy in the previous example) as a proxy for the associated object. This proxy can be made a part of the associating element by composition, and thus all the rules thereof apply exactly. So: for singular cardinality on type B ("for each A there must be exactly one B"), the referencing fields will be required attributes or child elements; in fact in this case it may be the value of the associating element itself. For optional cardinality ("for each A there may be one B"), use an optional attribute or child element. For multiple cardinality ("for each A there may be zero, one or many Bs"), use child elements and an "unbounded" maximum occurrence constraint.

For example, consider a model a newspaper might use for printing movie listings:

Here, a Theatre makes reference to various Movies it might be showing. The Movies schema shows how to do this by combining an xs:keyref with occurrence constraints. The key reference is in no way special, and by itself describes many-to-one cardinality, which is also desirable for the Showing type. See also a valid instance document.

But, wait: where's the "multiple" B type? The diagram above maps the XML schema construction exactly, and in doing so lays out the Showing type as what in an RDB would be a link table, and what can be represented in UML as a link attribute. The alternate notation below uses the link attribute to clarify the role of Showing, and also states the cardinality of the fundamental relationship between Theatre and Movie more accurately:

Controlling Association Cardinality: Optional Associating Type

On the associating side, we can implement optional cardinality by insisting on the uniqueness of the referencing fields: in other words, multiple A elements referencing the same B are disallowed. For this purpose a third XML Schema construct can be used: the uniqueness constraint, implemented using the xs:unique component. So we'll assert a key on B, a key reference on A, and an additional uniqueness constraint on A, using the same fields found in the key reference.

Let's say a restaurant keeps its reservation book as an XML document. It lists its tables, organizes each evening into planned seating times, and then captures each reservation as a name and number of people related to table and time:

Clearly, though each reservation references a table and seating time, we don't want double-bookings. The Restaurant schema enforces this constraint by laying an xs:unique assertion over the xs:keyref, so that each combination of table number and time must be unique. A valid instance document is Book.xml.

A more practical version of this design would include dates, rather than repeating all that table and seating information in multiple documents. There are a few approaches to this:

We might make the date another attribute of Reservation, and add it to the uniqueness constraint. See RestaurantWithDates.xsd and BookWithDates.xml.
A more natural content model might call for an Evening element composed of Reservations. XML Schema identity constraints really shine here, because they can leverage the natural partitions created via element composition. The constraint can be placed at the appropriate scope, forbidding unique table-time pairs within a given Evening while allowing them within the broader document. See RestaurantByDates.xsd and BookByDates.xml. Note that the xs:key and xs:keyref components are unchanged, and that the xs:unique has simply moved to the Evening scope and lost its Date field.

Controlling Association Cardinality: Singular Associating Type

To achieve singular cardinality on A is to insist that for every B there is some referencing A. This is the exact inverse of the key reference that's already in place, and the way to implement this cardinality is to create a symmetrical key and key reference. So, for a one-to-one association, there would be two keys and two key references, one each on types A and B.

Notice that this will work even when the cardinality of B is not singular. The idea of "exactly one A optionally referencing a B" may fit into the human brain only with some pushing and twisting, but it is both conceptually sound and technically feasible. This sort of relationship often will make more sense at every level if it is restated as "for every B there is exactly one A", and certainly this is simpler to implement. However, there may be tension between ease of implementation and the correct naming and navigability as designed: "inversion" to the implementer may be "perversion" to the designer. Keep in mind that the effort here is all in the schema; there is no burden on the instance documents, and if anything they may be most readable and intuitive according to the original design.

Summary

The following table summarizes the techniques for various multiplicity requirements for each side of the relationship:

Multiplicity	Technique for A	Technique for B
Multiple (0-n)	`xs:keyref` alone	Multiple child elements
Optional (0-1)	Add `xs:unique`	Optional attribute or child element
Singular (1)	2nd `xs:key` and `xs:keyref` in the other direction	Required attribute or child element, or the value of B itself