Redundancy emerges from the following (just one abstract example of many possible):

An application (A1) produces a data set ‘ABCD’ which is consumed by application A2.
Create application A3 that also wants to consume data ‘AB’ from A1, and also data ‘EFG’ available from A1, but not yet a part of it’s output.

From a designer’s viewpoint, it makes sense to modify what puts or gets data ‘ABCD’ from A1 and make it get ‘EFG’ at the same time. Then, write a script to serve data ‘ABEFG’ to A3. But this is not usually what happens in hurried development or operations groups. Instead:

A new extract is created for A1 that produces dataset ‘ABEFG’. This new dataset and the process that creates it, is independent of the data and process that generates ‘ABCD’, obviating a substantial requirement for regression testing.
Some new support is still required for data consumption by A3.

Two scripts (possibly small systems) are thereby created. If the ‘A’ in ‘ABCD’ changes, two datasets and 3 processes require attention as compared to one dataset and two processes otherwise. Now consider if there are 10 stovepipe connectors with partially redundant data. Adding a new independent connection still requires, at most, 2 extra I/O functions, while modifying the system to minimize redundancy for the sake of manageability requires the regression testing of 20 functions! Unless time is made for rectifying a purely evolutionary script-based workflow system in it’s early stages, a busy IT department (especially if operations are responsible for workflow) will almost always elect to go the quick route and just add a few more little scripts to their operational watch lists.

Text Box 1.

One source of redundancy in workflow systems