1

I migrated data from Postgres to Redshift and doing data validation checks, so taking MD5 for complete row. Visual checks are ok between Postgres to Redshift, but MD5 value differs for Float8 data types.

In Postgres:

balance|md5(balance::VARCHAR)
-140.2|d59f47f21a88e8b73d4ca309d75cc64b

In Redshift:

balance|md5(balance::VARCHAR)
-140.2|539dc58d834b1cf24252705b4f40b7f1

Anyone got this issue? any thought on why is this difference? Is there any other way to do data validation for all rows across database after migration?

Athi
  • 347
  • 4
  • 12
  • 1
    Does using `md5()` on a `numeric` column also show this behaviour? If not, then it's probably caused by `float` being an *approximate* data type and the implementation between Redshift and Postgres differs in some way (despite what Amazon states in their marketing stuff, Redshift is **not** the same as Postgres) –  Mar 03 '21 at 15:08
  • Numerics are good. Only issue with Float value. – Athi Mar 03 '21 at 15:10
  • any thoughts to do row level data validation for migrated data? – Athi Mar 03 '21 at 15:12
  • 1
    Neither value corresponds to `md5('-140.2')`, so you may want to inspect `balance::VARCHAR` separately. – Jeroen Mostert Mar 03 '21 at 15:29
  • I cannot reproduce that. Perhaps you can come up with a complete reproducible test case. – Laurenz Albe Mar 03 '21 at 15:42

1 Answers1

2

Redshift MD5() operates on strings and as such these floats need to be casted to string. Assuming that you really have the exact same value in both databases the likely cause is that the string representations from these implicit casts differ. You will want to control how the floats are represented as strings by using to_char() function.

Bill Weiner
  • 8,835
  • 2
  • 7
  • 18