XML Schema Clinic -- UML for XML Schema Design

XML Schema Clinic — UML for XML Schema Design

Will Provost

Originally published at XML.com, August 7, 2002

In spite of the many clear advantages it offers over the fast-receding DTD grammar, XML Schema cannot be praised for its brevity. Indeed, in discussions of XML vocabulary design, the DTD notation is often thrown up on a whiteboard solely for its ability to quickly and completely communicate an idea; the corresponding XML Schema notation would be laughably awkward, even where XML Schema will be the implementation language. This makes a graphical design notation such as UML all the more attractive for XML Schema design.

UML is of course meant for greater things than simple description of data structures. Still, the UML metamodel can support Schema design quite well, for wire-serializable types, persistence schema, and many other XML applications. To say the least, UML and XML are likely to come in frequent professional contact; it would be nice if they could get along. The highest possible degree of integration of code-design and XML-design processes should be sought.

Pushing the Envelope

Application of UML to just about any type model will require an extension profile. There are many possible profiles and mappings between UML and XML, not all of which address the same goals. The XML Metadata Interchange and XMI Production for XML Schema specifications, from the OMG, offer a standard mapping from UML/MOF to XML Schema, but specifically for the purpose of exchanging models between UML tools. The model in question may not even be intended for XML production; XML Schema simply serves as a reliable XML expression of metadata for consumption in some other tool or locale.

My purpose here is to discuss issues in mapping between these two metamodels, and to advance a UML profile that will support complete expression of an XML Schema information set. The major distinction is that XMI puts UML first, so to speak, in some cases settling for a mapping that fails to capture some useful XML Schema construct, so long as the UML model is well expressed. My aim is to put XML Schema first, and to develop a UML profile for use specifically in XML Schema design:

The profile should capture every detail of an XML vocabulary that an XML Schema could express.
It should support two-way generation of XML Schema documents.

I suggest a few stereotypes and tags, many of which dovetail with the XMI-Schema mapping. I discuss specific notation issues as the story unfolds, and highlight the necessary stereotypes and tags.

David Carlson has also done some excellent work in this area, and has proposed an extension profile for this purpose. I disagree with him on at least one major point of modeling and one minor point of notation, but much of what is developed here lines up well with Carlson's profile.

Modeling Simple Types

Derived simple types

Problem: simple types can be derived as part of a schema design.

Solution: «simpleType» stereotype of UML classes; constraints define constraining facets.

Traditionally UML models are built with a set of primitive types assumed, per target type model. In XML, things are trickier, since even simple types can be invented in the scope of a single design. To support this, a «simpleType» stereotype of UML classes is used. UML specialization comes in handy here: a derived «simpleType» can identify its base type via a specialization relationship.

Under «simpleType», the desired constraining facets might be modeled by overloading UML attributes and initial values, making for a convenient and readable notation. (Carlson goes this route, adding a stereotype for attributes as facets.) Even under a stereotype, though, attributes imply some actual state elements, and thus would be misleading. UML constraints offer a more correct notation for constraining facets.

Enumerated types are an exception to either notational choice for facets, since UML offers a standard stereotype for enumerations.

<xs:simpleType name="Review">
  <xs:restriction base="xs:decimal">
    <xs:minInclusive value="0" />
    <xs:maxInclusive value="5" />
    <xs:pattern value="[0-5](.5)?" />
  </xs:restriction>
</xs:simpleType>
<xs:simpleType>
  <xs:restriction base="xs:string" >
    <xs:enumeration value="G" />
    <xs:enumeration value="PG" />
    <xs:enumeration value="PG-13" />
    <xs:enumeration value="R" />
    <xs:enumeration value="NC-17" />
    <xs:enumeration value="X" />
  </xs:restriction>
</xs:simpleType>
<xs:complexType name="Movie">
  <xs:sequence>
    <xs:element name="Title" type="xs:string" />
    <xs:element name="review" type="Review" />
    <xs:element name="rating" type="Rating" />
  </xs:sequence>
</xs:complexType>

List and union types

Problem: lists and unions of other types can be defined.

Solution: xs:list as a parameterized type; xs:union implied by {xor} constraint on multiple specializations.

List and union types pose a slightly different problem. Unions allow values from any one of several spaces as defined by other types. To model this, UML specialization with an {xor} constraint seems the cleanest expression, although no precedent for this combination has been found. The list is a more obvious mapping: it is a parameterized type, instantiable on any other simple or complex type in the model.

<xs:simpleType name="IntegerList">
  <xs:list itemType="xs:integer" />
</xs:simpleType>

<!-- Enumeration Settings not shown. -->

<xs:simpleType name="OvenSetting">
  <xs:union memberTypes="xs:integer Settings" />
</xs:simpleType>

Modeling Complex Types

The UML class is well suited to model the XML Schema complex type. UML attributes pose a slight problem: in most UML-supported type models, there is only one way of implementing a state element, but in XML there are two: attributes and child elements. The latter are the only option when the state element is itself of complex type, but for single values either will do. In some schools, either attribute-only or element-only styles are favored, but for our purposes it's important to support a choice between the two.

UML attributes

Problem: necessary to distinguish XML attributes from XML elements.

Solution: «attribute» stereotype of UML attributes, conventionally presented as the single character '@'

Formally, this specialization of a metamodel element is best interpreted as a UML stereotype: the «attribute» stereotype of UML attributes. XMI allows for exactly this stereotype in "tailoring" schema production. This might seem unwieldy, but UML allows for graphical or textual shorthands for common stereotypes.

With the increasing prevalence of XPath in XML documents, application code, and design discussion, the XPath @ prefix for attribute names has much to recommend it. It is wonderfully brief, already in common parlance, and fits neatly as a shorthand for the «attribute» stereotype, in graphical or textual representations.

The type of a UML attribute can be specified in the usual way, whether simple or complex. Built-in XML Schema types may be represented by their local names, or using one of the common namespace prefixes xs: or xsd:. Also, compositional relationships can be drawn to identify (UML) attribute type, as we'll see in a moment.

<xs:complexType name="Part">
  <xs:sequence>
    <xs:element name="name" type="xs:string" />
    <xs:element name="price" type="xs:decimal" />
  </xs:sequence>
  <xs:attribute name="partID" type="xs:token" />
  <xs:attribute name="inStock" type="xs:positiveInteger" />
</xs:complexType>

Modeling Relationships

Relationships between UML classes line up nicely opposite the XML Schema options. In UML, composition (aggregation by value) is not the most basic relationship type, but in XML it is, and so we start there. Composition maps to composition, and cardinality maps to occurrence constraints. The role name can be mapped to the desired attribute or element name, and can take advantage of the «attribute» stereotype and the @ prefix.

<xs:complexType name="Dealership">
  <xs:sequence>
    <xs:element name="inventoryItem" type="Part" 
      minOccurs="0" maxOccurs="unbounded" />
    <!-- Other content omitted. -->
  </xs:sequence>
</xs:complexType>

UML associations are trickier for XML, which is fundamentally hierarchical. XMI maps associations to XLinks; this is sound but exemplifies the problem with using XMI for XML Schema design, as XLinks are outside the Schema vocabulary. Within XML Schema, associations map most naturally to key references. (This is the major issue with Carlson, who maps all associations as compositions, blurring the distinction between association by value and reference. Associations that are not explicitly modeled as compositions must be preserved in the schema, so that a single instance with multiple references in an object graph is not spuriously multiplied.)

UML associations

Problem: key and referencing fields must be identified.

Solution: «key» stereotype of UML attributes; ordered list of referencing fields encoded in the association role name.

Another difficulty crops up here. Core UML can describe the cardinality of the association, can give it a name from each side and express navigability. What it cannot do is identify the selector and field components to be used in the XML Schema.

This information is actually more relational than object-oriented in nature, and it exposes what is probably UML's weakest suit: identifying key fields. There is no real home for this information in the UML metamodel — in UML identity is strictly implicit — and yet these paths will need to be specified to complete a generated XML Schema. This could be addressed as a tag or a stereotype, and although it's a bit of a forced fit, I propose a «key» stereotype of UML attributes, to be presented via a simple shorthand. Note that this may overlap with the «attribute» stereotype, resulting in notation such as «key»@unitID. UML modeling tools can automate a mapping between this stereotype and the definition of an xs:key governing the enclosing type.

Also, the association itself will need to identify the ordered list of referencing fields to generate an xs:keyref. This can be derived from the associating role name; multiple field names can be packed into this name as a list, or can be attached as tagged values.

<xs:element name="Dealership">
  <xs:complexType>
    <xs:sequence>
      <!-- Referenced types Car and Salesman not shown. -->
      <xs:element ref="Car" minOccurs="0" maxOccurs="unbounded" />
      <xs:element ref="Salesman" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
  </xs:complexType>
  <xs:key name="SalesmanKey">
    <xs:selector xpath="Salesman" />
    <xs:field xpath="@sellerID" />
  </xs:key>
  <xs:keyref name="CarToSalesman" refer="SalesmanKey">
    <xs:selector xpath="Car" />
    <xs:field xpath="@soldBy" />
  </xs:keyref>
</xs:element>

Mappings of UML association cardinality are in fact the primary subject of an earlier XML Schema Clinic article; see "Enforcing Association Cardinality" for a full discussion of implementation strategies.

UML specialization

Problem: two mechanisms for XML complex-type derivation.

Solution: «restriction» stereotype of UML specialization.

UML specialization maps more neatly to XML type extension for complex types. Both imply that the derived-type state elements are appended to the base type's state model.

The only trick here is that XML offers another means of complex-type derivation, which is restriction. This is another appropriate use of the UML stereotype, and so we define «restriction» as a stereotype of specialization. In this case the derived UML class will state the changes to the base-class content model; the Schema generator will be expected to merge these changes into the base content model for restatement in the restricted complex type.

Miscellaneous Schema Information

xs:schema attributes

Problem: various attributes on xs:schema with no natural home in UML.

Solution: tagged values for targetNamespace and element/attributeFormDefault.

One key question we've yet to address is where the schema element fits into the UML model. There are options here: either the entire model can be directed to a schema, or in more complex models packages may be used to model XML namespaces. In either case there must be a property that identifies the target namespace URI.

The elementFormDefault and atributeFormDefault attributes of the schema element truly live outside the UML world view. These must be properties at the same scope as the target namespace, whether package or model.

Also, we've thus far assumed that all content models are sequences. To model a choice, use an {xor} UML constraint; if you need to model xs:all, either an {unordered} constraint or a separate stereotype of the UML class would do, but the former is a better conceptual fit.

One last problem is the distinction between local and global types in XML Schema. This is actually a more common problem than most we've considered: C++ and Java, among other metamodels, have namespace-partitioning constructs such as nested and inner classes. The UML specification offers a couple of possible notations for nesting one type within another (see section 3.48.2), and most tools have a means of establishing this relationship as well.

Directions: Behavioral XML?

For those of us who've found the absence of behavior modeling frustrating, it's a relief to realize that there is indeed more to XML than data structures. With a robust profile in hand by which XML Schema can be expressed as UML, we can turn to more adventurous uses, especially for XML messaging. As with all things XML, data is never far from metadata; the WSDL specification, especially, shows off XML's ability to encode method invocations, and plays a schema-like role in prescribing XML message content.

From the humble beginnings of data-centric XML, WSDL descriptors rise once again to the level of object-oriented encapsulations. Now, suddenly, the full power of UML can be brought to bear. A WSDL «portType» stereotype can express the semantics for an entire Web service, and can be the source for a complex generation of not only WSDL and XML Schema documents, but also service or client code to support SOAP or HTTP messaging. No specific mapping rules are proposed here, but hopefully the following hypothetical will whet the reader's appetite:

For taking the time to discuss various concepts in this article, I'd like to thank Richard K. Fisher and Jean Pierre LeJacq.