Data Access in legacy migrations

When transforming legacy programs into fully equivalent programs in modern technology that use a relational database, special attention is required to implement the mapping of the data access statements to read, update, delete and insert data in the various legacy data stores. These data stores - such as IDMS, UDS, IMS, ADABAS, and VSAM, are not accessed through SQL syntax like you would use for an RDBMS. Instead, each data store provides the developer with dedicated syntax (typically called DML statements where DML stands for Data Manipulation Language) to access the data in all necessary ways.

When we started working on the first generation of the CodeTurn transformation tools, targeting various flavors of COBOL (both on and off mainframe), our initial idea was to replace each DML statement directly with its SQL equivalent. The target of the transformation is a relational database after all. While the reasoning behind this idea is valid, the actual practicality of it did not hold. Let’s look at three specific issues:

  1. It is desirable to have a normalized data structure in the target relational database, which often implies that a single legacy record structure needs to be mapped to multiple relational tables. This means that a single legacy DML update statement needs to be mapped to multiple SQL statements.
    For example, a single WRITE statement would need to be translated in multiple SQL INSERT statements: one for each extra table in the normalized data model.
  2. Error handling of legacy DML statements is often deeply embedded in their hosting language, and hence not simply mappable to checking a SQLCODE or SQLSTATE error condition.
    For example, a COBOL READ statement is expected to fill a COBOL program’s relevant FILE STATUS area, and potentially – if an error occurs during program execution – invoke one or several DECLARATIVE procedures to handle the error.
  3. More advanced legacy data stores, such as IDMS and ADABAS, are ‘stateful’ in the sense that they remember the results of an earlier DML statement, hereby affecting the meaning and effect of later DML statements.
    For example, in IDMS each set has a current record and later read operations can continue where a previous operation left off, without the need to manually reposition.

All the above imply that a lot of additional support code needs to be generated to ensure functional correctness, hence decreasing the maintainability of the transformed application. Maintainability of the transformed application is one of the key evaluation factors[1] of an automated transformation however and this meant that this design was not implemented. Instead, it was replaced by an alternative which we internally refer to as “IO Modules” but which is more widely known as the “Table Access Gateway” pattern.

With the “IO Modules design”, each relational table is accessed via a API (i.e. calling interface) that is implemented in a separately generated piece of code: the “IO Module”.

When targeting COBOL, each IO module:

  1. is a COBOL subroutine that internally uses embedded SQL for all database access.
  2. gets generated automatically by DataTurn, hereby enabling easy switching of COBOL and SQL dialects, as well as target data layout preferences.
  3. can be re-generated whenever the underlying database layout is changed, without necessarily affecting any programs that make use of the IO Module.
  4. is called from the transformed legacy programs in those locations where the original programs use DML. The provided API between the program and the IO Module is simple and recognizable for developers familiar with the legacy data store.
  5. works together with any additional support code, by publishing status codes and/or registering ‘callbacks’ for error situations.
  6. can contain additional performance tuning code, e.g. using different SQL access patterns for online and batch execution context, re-using database cursors, enabling the use of caching when it is known that data won’t ever change during execution, …

Since its inception, the IO Module design has been successfully implemented in many different projects.

When the time came to work on a new CodeTurn generation that supports C# and Java as output languages, the original design was revised by the Anubex architects, leading to a significant update to better match the characteristics and advantages of these languages – while still adhering to goals of functional correctness, good performance and maintainability of the transformed business programs.

When CodeTurn generates Java and C#, each generated IO Module only contains the structural information of the target relational database (which columns, column types, relations, …). The bulk of the actual data access (i.e. the SQL statements executed at runtime) is provided by a single code library. This design further decreases the amount of code to be maintained by the development team, while still retaining the benefits mentioned above. Additionally, more advanced and more dynamic performance optimizations are now possible as well, mainly thanks to the virtues of the Java and .NET platforms. Some examples of these new performance tuning options include:

  1. The possibility of dynamically creating, caching and re-using SQL statements, based on the exact types of bind variables and table columns.
  2. The dynamic switching between single row access to cursor-based access, where the database engine is informed upfront about the likelihood that more data will be required.
  3. The use of multiple threads to speed up specific operations
    For example, closing a Java or C# resultset can take considerable time and since the underlying transaction is already completed, it makes sense to not let the code’s main thread wait until the close operation is completed.

Also this revised IO Module design has now been successfully implemented in various projects.

Read more on CodeTurn and DataTurn on their respective product pages on our website. You can also request our whitepapers through migrations@anubex.com to get an in depth understanding of our products.


[1] With functional correctness and runtime performance of the transformed code being the other.