Data migration verification

1. Is my data migrated correctly?

This question gets asked at different times: certainly, during a proof-of-concept, sometimes during an assessment and almost always just before going live, when everything is being checked and double-checked.

Often, the following informal formulation is being used:

Can we be sure that our data is migrated correctly?

Meaning two different things:

  1. Is all my data migrated?
  2. Is the migrated data equivalent to the source data?

In order to vouch this, Anubex has embedded a data verification mechanism into their data migration solutions. Together with the batch and online application testing cycles, one can sleep on both ears when no differences are detected.

Below, we will explain in a bit more detail what this data verification entails and what it proves.

1.1. Data verification, not validation

We speak about data verification and not data validation, why?

Because we verify whether the data has been migrated correctly, not whether it is valid.

Following two examples clear up the difference:

  1. When the source data contains invalid data (E.g. value ‘X’ in a switch field that logically only may contain ‘Y’ or ‘N’ values), the target database will also contain this invalid value after migration. The data may not be valid according to the application’s definition of valid, but still get marked as correctly migrated.
  2. When fields are migrated to a numeric RDBMS field and do not contain numeric data (e.g. in COBOL it’s not uncommon to have spaces in fields with a numeric picture), these fields will by default get migrated to a numeric zero value. This will be logged during data migration, but during data verification, a source and target zero value will again be marked as correctly migrated.
1.2. Configurable

Because processing resources and time can be limited, it’s not always desirable to have the full verification activated at data migration time. Certainly not at D-day when timeslots for data migration are very strict.

That’s why it’s possible to configure the data verification at the individual field (i.e. column) level: only the mission critical columns can be verified during the final data migration, since a larger verification will have been configured and executed during all previous project phases.

1.3 What and how
1.3.1 What do we check?

At target table level we check the number of rows. Does it match with the number of data rows that we have migrated and were retrieved from the source legacy data store.

At field level, we check both the actual data of each field as well as its logical order (i.e. row identification number).

The actual data of a field is for a numeric data item, the numeric value itself, for other data items it is a numeric representation of its non-numeric data.

For alphanumeric fields we can choose to limit the verification to only it’s length instead of the actual content. This will require less resources, since the verification of alphanumeric data can be very time consuming for long fields in large tables.

1.3.2. How do we check this?

On the source side of the data migration, next to take the necessary steps for migrating the data, a checksum value for each field that was configured for verification will be calculated when running through the data.

This checksum is in general the sum of all field values multiplied with the row identification number.

This row identification number is determined in a data store specific way:

At the target side, a SQL SELECT statement will calculate these checksums based on the target table data. The checksums will be returned as a single row and can be compared with the checksums retrieved from the source data migration programs.

Schematic example of a data migration and verification process:

1.4 Examples
1.4.1. Fictive sample

Image we have on the source side a VSAM file that we want to migrate to a RDBMS system. The file contains 3 rows and 3 fields. A numeric field, a single character field and a large alphanumeric field (with a max length of 50), containing following values:

When migrating this file, following verification log output will be written.

  1. The number of rows -> 3
  2. The numeric checksum -> (123 x 1) + (456 x 2) + (789 x 3)
  3. The character checksum -> (Y=89 x 1) + (N=78 x 2) + (Space=0 x 3)
    Trailing spaces are trimmed
  4. The alphanumeric Sum Of Lengths (SOL) -> (13 x 1) + (0 x 2) + (43 x 3)
  5. The alphanumeric checksum -> (683 x 1) + (0 x 2) + (4089 x 3)

Similar output can be retrieved from the database with a single SELECT statement. This SELECT statement can be provided by our DataTurn migration solution.

1.4.2. Real life sample

Following screenshot shows a sample of the output logged by the data migration process (left) and the output retrieved by the query executed on the target database (right).

In this example, we see that almost 20 million rows have been correctly migrated to the target RDBMS. All field checksums also give identical numbers for source and target calculations.

For numeric fields, we see CHS (checksum) values.

For alphanumeric fields we see SOL (sum of length) and CHS (checksum) values

And finally, for binary fields, we see some SB4 (checksum of the 4 first bytes) and SBA (checksum on all bytes) values.

2. Conclusion

The data verification process is a great add-on to the existing application tests that will be executed during a migration project.

Application tests typically only use a subset of all migrated data, making it difficult to prove that all data has been migrated properly.

Anubex’ Data Verification mechanism can actually provide that proof, and do that in a level of detail that is adapted to the project phase it is used in: during the project’s testing phases literal all tables and fields could be checked, while at D-day, one can focus on only those tables and fields that have been identified as key indicators.

3. Are you finding data migration verification easy to manage in your project?

Don’t hesitate to contact us at migrations@anubex.com with any additional questions or if you need any help with database verification after your migration project. We have several tools, like TestMatch and DataMatch, that can automate your database verification quickly and accurately.

Want to know more?

Anubex has the right solution for you! Don’t hesitate to contact us with your inquiries.

CONTACT US