0

I have some Oracle and MySQL databases migrated to DolphinDB with dataX and mysql plugins, and I want to check if the migrated data is consistent with the source.

I’ve tried the following 2 methods:

  1. Compare the number of records between source and target databases. The problem is count in MySQL is very inefficient and it takes more than a day. Besides, this method only validates the volume of records, not the data consistency.

  2. Retrieve data in batches and check the values. High volume of data still affects the performance.

Is there any efficient way to check the data consistency between the source and target database?

Jon Heller
  • 34,999
  • 6
  • 74
  • 132
Eva Gao
  • 402
  • 1
  • 7
  • If comparing counts takes more than a day, then anything more serious validation will take you even more time. Migration is a one time activity, so I would not worry that much about performance. – Shadow Feb 01 '23 at 07:33

1 Answers1

1

My solution verifies the data consistency of data migration where table with the largest data volume contains 900 million records with more than 30 fields.

To validate the data consistency, a primary key is required to sort the data and calculate MD5 value of each record.

This method calculates the final MD5 values of the source and target databases and compares the data:

  1. Combine the fields of each row into a string and calculate MD5 value;
  2. Split the MD5 value of each row into 4 32-bit binary numbers, and add up the values of all rows to obtain 4 summed values;
  3. Concatenate the 4 values into a string. Compare the strings to verify the data integrity.
  4. In case of inconsistency, compare the MD5 values of each row between databases with the primary key.

Tip: The calculation can be done in SQL. To improve the performance, you can sort the data and split it into groups, and check the data consistency concurrently using Python.

Polly
  • 603
  • 3
  • 13