EDI Architecture

Matthew Rapaport

Jan 2004

In a previous article (“Messaging, Middleware, and EDI” Nov 2002) I discussed Electronic Data Interchange (EDI) in a “middleware” context. Middleware integrates data among different applications. Since EDI involves exchanging data between in-house applications and applications at other companies, a comprehensive middleware architecture often includes it.

Yet while EDI might be considered “middleware”, it also shares characteristics with applications in their own right. Although it lies between applications like other middleware, the applications to one side of it do not belong to your company! The whole point of EDI is to enable your company to communicate with other applications that are:

Always on the distal end of some long-distance link, and
Always “black boxes” from the viewpoint of your own network and applications.

The whole point of EDI is to make data exchange possible with applications whose nature you cannot know, and whose behavior you cannot control. This makes EDI a special case of middleware, one better treated as an application (a suite of applications) unto itself.

The EDI Domain

As we saw in the previous article, EDI is like a set of nested boxes with the X12 document representation protocol in the center of it. Surrounding this are the mechanisms needed to exchange data over various public and private networks. Some large companies relegate to their EDI departments only those data exchanges that involve X12 conversions, giving over any other data exchange (for example the use of proprietary formats) to a different department. The communication protocols used to exchange data (versions of ftp, http, sockets, etc) may be under the control of yet a third department. This arrangement makes for a cumbersome architecture involving two or three departments in any new trading partner relationship.

The operations of a “data communications” department can be viewed both functionally, from the viewpoint of the ISO 7 layer network model, and structurally towards the rest of a company. Fig1 shows EDI’s relation to the ISO model. EDI traditionally occupies layers 5 and 6. Layer 5 comprises its workflow, encryption (at the document layer), and auditing functions. Layer 6 encompasses format translation and sometimes data type transformation (e.g. ASCII to EBCDIC) features. At the same time, a robust EDI department must also have a say in what goes into and on in layer 4. First because its workflow uses the layer 4 utilities. Second, in fulfilling the business’s needs for trading, the EDI department must have its choice of layer 4 protocols. Encryption choices also exist at layer 4 and should be at the disposal of data communications. A robust EDI department should not have to say to the business side of the company “we can’t connect to partner X because we don’t support the connection types the partner has available.”

Fig 2 shows the relations between the parts of the EDI department and the rest of the company. The EDI department is special because it is the only department in your company that talks direct to another company’s systems! This is a great privilege, and a great responsibility. The “good citizen” of the data communications universe will send only properly formatted and packaged data, perform timely acknowledgments of the data of other’s, and always be able to receive proper data from external partners over agreed-upon channels. Only a department isolated from the affects of changes to all other applications can assure the corporation of this ability. That is the reason for isolation.

Facing the trading partners directly are the protocols of the ISO model’s layer 4. These are the various mechanisms which guarantee that the data which leaves the boundaries of one corporation arrives at the gates of the other without changing in any particular. While some network’s do transform data, for our purposes, those networks are a part of the trading partner since they do in fact represent other system’s whose data format requirements must be respected. By contrast, non-transforming networks, while they are technically “another company’s systems”, are effectively transparent to the two companies trading, and the data arrives as it left! Note that e-mail, normally considered a layer 7 application, behaves in this EDI context as a transport layer. EDI applications use e-mail protocols to carry data!

Security choices available at layer 4 should also be at the disposal of the EDI department. These include secure ftp, secure sockets, secure http, and others. Security at this layer applies to the channels through which data flows, though documents themselves are not encrypted. This is a low-level of security that is convenient for the governing session layer to use and often obviates other higher-level encryption.

The ISO “session layer” matches roughly to EDI’s workflow and auditing functions. Non-EDI applications in the rest of the corporation use EDI to exchange data with other company’s applications. Since the communication takes place between EDI departments (or their equivalents), it follows that the EDI department gates the exchange in both directions. The session layer is also responsible for any encryption or decryption between documents themselves as distinct from the protocols used to send them (layer 4). Though individual document encryption is independent of the protocol used for network exchange, session layer encryption takes place on the EDI side of the data boundary, transparently to applications, which continue to create, and consume, unencrypted documents. Auditing is also a role of the session layer, sometimes with the presentation (translation) layer. Auditing often amounts to saving work – pre- translated and post-translated data, checking acknowledgments against sent documents, producing acknowledgments for received documents, and making retransmissions (as needed by trading partners) convenient.

The session layer implementation takes a wide variety of forms from shell scripts to built-in (to EDI software) workflow management, to elaborate work-flow packages often referred to as middleware. While each of these approaches has its merits, consider them in context of the corporation’s other integration requirements and the structure of the EDI department. Middleware packages (for example) usually violate the principle of EDI isolation from other IT departments.

The ISO’s “presentation layer” matches EDI’s translation features. The whole point of X12 EDI was to standardize the format of documents exchanged between corporations. Standardization allows each corporation to use what EDI applications it wishes provided only that they read and write the standardized documents. Translation is the heart of EDI! Even apart from X12, documents must still be translated into and out of any proprietary format agreed on by the trading partners. As with layer 4 communications, execution and timing of translations is under the control of the EDI department’s session layer workflow. Implementing translators takes many forms from rich scripting or other languages to elaborate translation packages. Packages usually contain their own workflow, but using them it isn’t usually required. Unlike workflows and audits, which can be self-produced relatively easily, translations can be a time-consuming development process. Translation packages are frequently a good choice if there are many standardized (X12 or the emerging XML) documents to exchange. Translators, do not violate the principle of EDI isolation. Most of them are file-oriented (they take data from, and leave results in, files), conveniently supporting isolation from other IT applications.

The DATA BOUNDARY LAYER

The data boundary is the secret to the EDI department’s capacity to carry out its mission independent of and insulated from changes in the rest of the corporation’s applications. It also gives it the freedom to change itself, upgrading specific applications for example, without affecting the rest of the corporation. All intercourse between EDI and the rest of the corporation takes place via the exchange of data in this layer (despite a narrow “command channel” discussed briefly below).

At its simplest the data boundary is nothing but files, one for each transaction between a corporate application and some application at a single trading partner site. EDI and each of the departments sending or receiving data pre-agrees to names, and formats of data exchanged between them. File management can be manual, or using some file-management product like Sterling Commerce’s “Connect Mailbox”, which provides built in logging and auditing support. Departments put conventionalized files in the data boundary, and EDI translates and sends them to trading partners. EDI places data received from partners as files in the boundary layer, and other department’s applications consume them.

Files are only one approach to a data boundary. Another approach might be to tokenize application data as XML, and put the tokenized data, and aggregations of tokens, in the data boundary, possibly as records in a database. Every element in all the databases of the corporation (at least all those exchanged with trading partners) might be given a standard XML representation. Various collections of these tokens are also given XML representations. A collection might represent an order, a claim, and loan, etc. Instances of tokens stored in a database with keys, allow EDI to aggregate the individual tokens and convert them for transmission! Alternatively applications can aggregate individual tokens into collections, and hand these to EDI via database records, file managers, or files. On the reverse trip, EDI can translate trading partner documents into collections of tokens consumable by applications. There are many ways to design the data boundary to suit a corporation’s needs.

However one sets up the data boundary, polling discovers new data. Applications poll for new data to consume, while EDI polls for data to send from the corporation to partners. Although the EDI department can be insulated from the rest of the corporation by its data boundary, there are sometimes applications that demand performance (even in remote data trading) that even fast file or database polls will not satisfy. Therefore, a narrow “command channel,” a small hole in the data boundary, is also permissible between IT applications and EDI applications. Applications are, of course, constrained to proper use of the channel. Applications leave data in the data boundary, but they use the command channel to trigger an EDI action and pass it a pointer to the data. In this way, EDI reacts more or less instantly to the presence of new data and does not have to wait on its polling to signal something to be done. The same can be done in reverse for data received.

Robust EDI.

Fig 3 shows EDI’s parts with one another and the boundaries above and below them. Although, the “presentation layer” lies between the “session” and “application” layers, from a functional perspective, “presentation” is sandwiched between two “session layers” (workflows), which control it.

Notice the figure shows two translation engines. Any medium to large sized company will have translations that are proprietary as well as those that are standards-driven. Both types of translations can be scheduled within a single workflow context. However it is often more efficient to have third-party packages perform the standards-based translations, while relatively simple scripts perform the proprietary translations. It’s tempting to let the workflow features of a standards-based translator handle movement of data into and out of proprietary translators. Translator software, however, is not often able, to manage conveniently, data it doesn’t translate. By contrast, it should always possible to write workflows that pass data into and out of standards-based translators. Middleware-centric EDI software often makes this otherwise simple task more difficult than it should be. Modern software should make complex work easier, and not complicate what used to be simple work! Today’s business computing environments, whether Unix, Linux Windows, AS400/OS or other operating systems all have rich scripting languages in which work-flow and auditing features are written fairly easily. Besides writing such flows, managing them is straightforward given modern WEB-based end user interactions. Writing WEB-based support for reporting from, and management of, modern scripts is almost trivial in today’s technological environments.

At the partner-facing end of the translation, an EDI department should strive for maximum flexibility in communicating with the transport level mechanisms available to it. Once again, this usually means bypassing the package-based workflow features and writing your own to insure that every transport can interoperate with each translation system in an identical fashion. The same is true at the corporate-facing end where EDI meets the data boundary. Most EDI packages can interact adequately with simple file-based data boundaries. Usually some intervening (proprietary) workflows will have to be written implementing a more sophisticated data boundary. If you have to write some of it, it is only a small step to writing the rest, assuring consistency throughout.

EDI is a critical ingredient of modern global commerce. An EDI department isolated from other corporate computer applications by a data boundary, and properly layered, insures maximum flexibility and reliability. Data exchange with other companies takes many forms, from the format of the data itself, to the protocols used to exchange it, and the networks over which it travels. There is no reason a single EDI department cannot be a central hub controlling and auditing all data exchange into and out of even reasonably large companies. Consolidating data communications in this way insures consistency from both technical and business viewpoints.