There are so many new terms being bandied around these days: CORBA, IIOP, XML, XSL, DTD, DOM, SAX, etc. What's a person to make of these (mostly) new technologies and where they fit into the overall picture? I'm going to attempt to address this issue including my opinions on how they will be used. Many people will have different slants on this, especially ones who have something to gain from promoting one technology over another. I'll try to be as neutral as I can, and I warn you in advance that I'm not an expert on all of these technologies. It doesn't mean I'm not trying to learn them however, as some of them have useful capabilities.

Let's address the CORBA and IIOP terms first. CORBA is the Common Object Request Broker Architecture. As I've mentioned before, this is a very powerful architecture which meshes very nicely with what I believe is going to be the language of choice for the new 'net. I'm speaking of Java, an object-oriented language which is both powerful and incredibly portable. I can create Java class files on an IBM ThinkPad and run them on the largest mainframe IBM makes. I can also take those class files and run them on Microsoft NT, HP-UX, Sun's Solaris, and every other platform for which the JVM (Java Virtual Machine) has been ported.

With the trend to object-oriented programming, a difficult transition for most procedurally-oriented programmers, by the way, the object-orientation of CORBA is a good match. It is very straight-forward, once you've done it a couple of times, to extend your objects to be CORBA-compliant. The real power of CORBA is transparent access to network objects, even those written in other languages. With the huge investment made in making legacy COBOL code Y2K compliant, companies will want to leverage this code in new, networked-oriented, rapidly-developed applications. CORBA becomes the foundation for the new network-oriented architecture.

IIOP is just the Internet Inter-ORB protocol, which is the codified rules for transporting CORBA across a network. The most common implementation is the one for the TCP/IP protocol, although IIOP doesn't preclude mappings for other protocols. One technology I believe will fall by the wayside is RMI, or Remote Method Invocation. With the existance of CORBA, with all the associated capabilities, I don't believe that RMI offers any advantages for new applications.

The initial function of a web server was to use the HTTP (Hyper-Text Transport Protocol) to transmit HTML (Hyper-Text Markup Language) to web browsers. A number of extensions have been made to the original architecture, things like support for Java applets and JavaScript. Additional sophistication has been added on the web server side, with things like JSP (Java Server Pages,) ASP (Active Server Pages,) and Java servlets. There are also more specialized products for generating dynamic web content like ColdFusion. These features have combined to make web browsing far more dynamic and rewarding.

The "traditional" architecture then will look something like this:

HTML is a special subset of the SGML (Standard Generalized Markup Language,) as is XML (eXtensible Markup Language.) While HTML is a presentation-specific markup language, XML is more generalized. HTML has a specific set of tags, defined by the W3 group. XML can use a specific suite of tags for business-to-business intercommunication but there is no theoretical limit to the actual number of tags which can be used. XML documents typically refer to a Document Type Definition (DTD) which specifies the permissible tags, although the same information can be included in the document itself.

XSL (eXtensible Stylesheet Language) is a super-set of the CSS2 (Cascading StyleSheets, level 2) specification and is typically used to control the presentation of XML documents at a high-level. When combined with an XML document by XSLT (XSL Transform,) a new XML document is produced. Given that a fully-formed HTML document is also an XML document, the XSLT process can be used to generate HTML. As defined, the XSL processing actually consists of two parts; transformation and formatting of an XML document. The result of the transformation step is a new XML document which is then formatted for a specific device. The destination device could be a web browser, a hand-held device or even a printer.

The diagram above shows a possible flow of XML processing. In the first step, XML is combined with XSL by XSLT. If the output document is HTML then processing could be complete at this point. An optional parser is shown further modifying the XML document. An application program would typically interface with the parse using either DOM (Document Object Model) or SAX (Simple API for XML.) Finally, the formatter can be used to generate device-specific output appropriate for the target. This can include PDAs, printers or web browsers, among others.

So the general structure is that the XML documents contains tags and attributes whose structure is defined in the DTD. The DTD also can contain constraints for the attributes. So where will we see XML used? Actually, in a number of scenarios; either the web server uses XSL to convert the XML to HTML, or the server actually sends the XML document to the client. In the first case, we use the server for the processing while in the second we use a Java applet or similar intelligence on the client. A third possibility is that future versions of browsers will support XML natively.

A concrete example is called for here. Image that we've got an n-tier application which generates a database lookup result as an XML document. A server-centric solution would use XSL on the web server to format an HTML document, perhaps containing hyper-text links which would permit the user to select the specific record of interest. Depending on the number of elements in the result set, this approach could generate long HTML pages. On the other hand, the requirements on the browser are minimal.

The second approach would involve downloading an applet to the client and having that client request the result set, as an XML document, from the server, along with the XSL. The client could then perhaps display a scolled list of the result set on the browser window and perform the logical processing of requests for a specific record. Once again, depending on the size of the result set, this could significantly tax the memory utilization of the applet. It should also be considered that not every browser supports applets, whether through design or user choice.

Above is a screen shot of a demonstration I put together to show how the various elements work together. All the data is fictional, and since it's a screen shot the links won't work. I've used some fancy XSL in order to create the active links, but static HTML would not require this level of complexity. Here are links to the files used in this demonstration:

If you want to experiment with this yourself, download the xalan package from Apache and follow the instructions. It basically involves adding the xalan and xerces (included in the xalan download) jar files in your CLASSPATH and running:
java org.apache.xalan.xslt.Process -in fn -xsl fn -out fn

I was recently doing some work with the DOM API in order to create and read XML files. There might be plenty of examples out there when it comes to SAX, but I couldn't find anything in the way of sample Java programs using DOM to interact with XML documents. There are also some cute little kinks to the processing of files in order to make them "human-readable", which seems strange since the whole point is to make them readable! Click here for an article showing how I approached a particular challenge.

Bottom line? XML appears to have considerable potential, especially as it is merely marked-up text. Various vertical markets have already defined DTDs for exchanging documents such as invoices and orders. Digital data can be converted to text form using representations such as base-64, permitting the transfer of images such as x-rays, etc. It's not inconceivable that future software will be able to collect and collate logical data from a variety of sources and be able to present it in a single cohesive view. Exciting possibilities indeed!

July 9th, 2000
(updated July 31st, 2000)