WSDL First!

Will Provost

Originally published at XML.com, July 22, 2003

They'll tell you a story, if you let them.

"Web services are a cinch," they'll say. "Just write the same code you always do, and then press this button, and — presto! — it's now a Web service, deployed to the application server, with SOAP serializers, and a WSDL descriptor all written out."

Web services vendors will tell you a lot of things, but probably most glorious among them will be the claim that you can develop Web services effectively without hand-editing SOAP or WSDL. Does this sound too good to be true? Well, maybe the case can be made, for RPC-style services, that SOAP has been relegated to the role of RPC encoding — that in its details it is no more relevant to the application developer than IIOP or the DCOM transport.

When it comes to WSDL, though, don't buy it! If you're serious about developing RPC-style Web services, you should know WSDL as well as you know W3C XML Schema (WXS), and be creating and editing descriptors frequently. More importantly, a WSDL descriptor should be the source document for your Web service build process, for a number of reasons. I'll discuss each in this article:

Development Paths

The willingness in some quarters to minimize the visibility of service description betrays a more basic and troubling bias, which has to do with code-generation paths and development process. The hypothetical blue-sky artist I've quoted at the top makes the assumption that the service semantics are derived entirely from application source code.

Now we've found the true culprit. Bad assumption, Mr. Blue Sky: there are two viable development paths for RPC-style Web-service development — from implementation language to WSDL and vice-versa — and, in fact, to start from the implementation language is the weaker strategy.

Each of these options actually implies a broader scenario. Everyone agrees that WSDL is the proper starting point when building clients, so we choose between two visions of Web-service development:

— Implementation language first, or "Impl-to-WSDL", which results in a path from service semantics to a client that understands them that passes through a WSDL descriptor:

Thus WSDL descriptors are generated, intermediate artifacts, expressing service semantics in a neutral format so that interfaces defined in Java, for instance, to be mapped and understood by a client built using .NET.

— WSDL first, in which both service and client rely on code generated from a central descriptor:

Here, all development flows from the WSDL document, which still functions as a neutral format for service semantics and type information.

Ah, Temptation ...

Certainly, the Impl-to-WSDL path calls to us. It's natural to want to start from code written in one's native language — especially in two common cases:

Pilot projects, in which there's already a lot to learn
"Wrapping" efforts that develop a SOAP interface to legacy code

Commercial tools naturally push in this direction, too. For one thing, the Impl-to-WSDL path supports the sort of RAD approach that is central to the value proposition of IDEs and application servers alike. Salesmen want to be able to demonstrate that little or no coding and a lot of code generation add up to a complete Web service. Also, WSDL is yet another enterprise standard, and companies don't get rich selling standards — even ones to which they adhere.

It's also a matter of what you're used to. Existing distributed object computing platforms — and, make no mistake, that's what RPC-style Web services are already, and will increasingly be — vary in this regard. EJB programmers are accustomed to building Java interfaces as their source documents for service semantics, and JavaBeans as their serializable types. CORBA developers, on the other hand, are more familiar with the idea that some IDL is the natural starting point: language-specific artifacts are generated from there. DCOM/ATL developers have seen this approach, too, although the Visual Studio environment blurs this line by dint of its RAD tooling: one most often defines an interface by filling in a dialog box.

Consider these platforms as precedents, however — tossing out EJB as language-specific. The other two derive interoperability between programming languages by relying on generation from an IDL. This is the proven pattern; why are we walking away from it for Web services? WSDL is the IDL for Web services, and it should be used as such.

Missing the Point of Web Services

The primary problem with the Impl-to-WSDL approach is that the service implementer assumes that he or she is the ultimate authority on service semantics. For services deployed on an intranet and meant for use within a company or division, this might be a safe assumption, but more generally it is a naïve one.

The common creation and deployment of widely available, business-to-business SOAP services is not so far off. In this context, an implementation-language-first philosophy misses the basic point of Web services: component interoperability based on progressively more vertical standards. It's generally understood that we get incrementally better interoperability (read: usability) as we add XML, WXS, and WSDL to the technology mix. Better still — if we control the WXS and WSDL content — we are positioned to specify service semantics appropriate to smaller scopes: per industry, per business activity, per community, per partnership.

There's the rub. Consensus over business semantics will demand expression in the neutral language, which is WSDL. It's neither interesting nor useful to build services from implementation-language interfaces and serializable types in the face of this (welcome) trend. The Web services architecture is not a wrapping technology — or it won't be, for much longer.

Lost in the Translation

At a tactical level, another argument for using WSDL first concerns type mapping and data binding. RPC-style services require mappings between WSDL (including WXS) and the implementation language, so that SOAP elements can be translated to method arguments and return values. Let us first understand that the fidelity of these mappings is not perfect — loss of type information is common in translating between programming languages and WSDL, in either direction.

Concerns, then, are when this mapping occurs, how often, and in what directions. The answers vary by path, and the difference is telling — see the path diagrams from the beginning of the article:

Using WSDL first, there will be many mappings, all flowing from the same original document.
When the original semantics are written into the service implementation, there will again be several mappings, but now they occur serially, and in different directions — that is, to and then from WSDL.

Therefore, the Impl-to-WSDL path places a much greater onus on the various language mappings. Using WSDL first, no component is ever more than a single generation away from the source document, and whatever imprecision exists in a given mapping — while potentially irritating to the programmer — will have limited impact.

Going the other way, though, one encounters two problems: multiple translations in series offer more chances to lose precision, and these several mappings will each introduce different losses. That is, after the initial Impl-to-WSDL mapping does some damage, a successive WSDL-to-Impl pass will erode a whole other set of model details. Thus the passage from implementation to WSDL to implementation threatens to degrade the semantic definition significantly — a bit like a game of She Sells Seashells — unless the mappings are really tight.

Well — are they? Not by a long shot. Even same-language mappings are not reversible, so after a round-trip through WSDL the type model will have metastasized. To get viable round-tripping, one must be resigned to using a tiny subset of the types and semantics commonly used in distributed programming today.

In fairness, this is only a snapshot of the current state of the art, and much of this will be cleaned up over time. The major vendors continue to put a lot of effort into supporting clean round-trips in their mappings.

Still, none claims to support round-trips perfectly, and with a moment's consideration it can be seen that this capability is a bit of a pipe dream, even in theory. The WSDL/WXS type model simply doesn't align that cleanly to any programming language, and more to the point it isn't meant to express the same things. Weakly-typed collections, for example, are wonderful programming tools, but nearly hopeless in data binding. Support for enumerated types is suspect, too, although for different reasons in different languages.

Finally, a side benefit of mapping from WSDL/WXS is that names as well as types can be preserved, reminding us that "descriptors" should indeed describe, and not just define. Java developers who've worked Impl-to-WSDL via JAX-RPC are familiar with the results of mapping methods: nice descriptive parameter names are translated to string0, string1, and booleanVal0. It's illustrative to note that the JAX-RPC specification requires that these names be carried precisely into the WSDL descriptor, yet the current implementations are unanimous in refusing to do so. Why? Java Reflection doesn't provide this information! So it's not realistic to expect tools to map this successfully. Going WSDL-to-Impl (or, in fact, from .NET, whose type information doesn't suffer this shortcoming), there's no such problem.

To summarize, here's an example — again, current state of the art, starting from Java:

Best Interoperability

Finally, WSDL first offers a clear advantage in interoperability of generated components. Under the WS-I Basic Profile, and in all typical practice, Web services rely on WXS as the fundamental type model. This is a potent choice: WXS offers a great range of primitive types, simple-type derivation techniques such as enumerations and regular expressions, lists, unions, extension and restriction of complex types, and many other advanced features.

To put it simply, WXS is by far the most powerful type model available in the XML world: more flexible than relational DDLs, and much more precise and sophisticated than the type system of any programming language in widespread use. Knowing this, why would we choose to use anything else to express our service semantics?

Now, one may wonder: what's the use of WXS' advanced features if they can't be mapped to the implementation language? First, let's not confuse the lack of a native language feature with the ability to build that feature using the language. WXS enumerated types mapped into flyweight classes in Java (the standard JAX-RPC) are a perfect example of this: using WXS one can easily build an enumerated type into a descriptor, and it will be well supported in the generated Java code. Note that there is no means of describing an enumerated type in the Java-to-WSDL direction.

Secondly, as I said in the previous section, it's important to think ahead, and to anticipate that those features that don't enjoy great support at the moment — lists, unions, disjunctions and restricted complex types are a few common examples — will be absorbed into a growing kernel of universal types that already includes most primitives and simple structs.

What's the payoff? Using WSDL first, services and clients share and enforce the same vision of message content — that's interoperability — and that vision can include the strong, precise types of WXS — that's best interoperability.

Conclusion

For new service development, certainly, and even for most adaptations of existing enterprise code assets, the WSDL-to-Impl path is the most robust and reliable; it also fits the consensus vision for widely available services based on progressively more vertical standards. It does a better job of preserving service semantics as designed, and offers best interoperability based on the rich type model of WXS.

So, dial up your favorite Web-services tools vendor! We should be getting the same facility and productivity using WSDL first as we already see for Impl-to-WSDL development — or better. Let the cry go forth: WSDL first!

Thanks to Michael Stiefel of Reliable Software and Bob Oberg of Object Innovations for .NET clarifications.