0

We have used Hash_MD5(col1,col2) in exasol. Now we are migrating to data bricks and we want to retain all Hash id's. Problem is data brick doesnot support multiple arguments in MD5 and concat MD5(Concat(col1,col2)) giving us different result. Any way we can retain haah id's after migration?

We are expecting same hash id,s between exasol and data bricks

Shakky007
  • 1
  • 1

1 Answers1

0

I'm not familiar with exosol but according to the exosol documentation:

The data types of the input parameters are significant. That is why HASH_MD5(123) is different to HASH_MD5('123'). Multiple input expressions are concatenated (in their internal byte representation) before the hash value is computed. Generally, HASH_MD5(c1,c2) is not similar to HASH_MD5(c1||c2)

https://docs.exasol.com/db/7.1/sql_references/functions/alphabeticallistfunctions/hash_md5.htm

Therefore, you should use the binary representation of your values instead of concatenation alone. Here are a few options for demonstration:

-- `y` and `z` are STRING types
SELECT
  MD5(CONCAT(y, z)) AS a, --concat as strings
  MD5(CONCAT(BIN(y), BIN(z))) AS b, --concat the binary representations
  MD5(BIN(CONCAT(y, z))) AS c --binary representation of the concatenation
FROM 
  df
+--------------------------------+--------------------------------+--------------------------------+
|a                               |b                               |c                               |
+--------------------------------+--------------------------------+--------------------------------+
|05c49d8ad656b7f9435f6371de69d41f|b89661612721e54dc380c30fec72793e|c399ab89c0693db6c7be6d275fd43237|
|29f9ec6574a4a67d44945541c0315681|4db4cb0606ca972f16cb90690a76b705|0f87d9792e450884732d874de1943ccf|
|e280347ef979563640e93d8ca1d17054|fa1f1764a061496642ebda8a05730735|aa7681ef8fd8a0b768c086a1e1aaaa11|
|5f2938f1950b0a2fc4c97ed71df05f63|f888cbf720b3ac7c8cd030aead404e66|323673df1eadd800657f7574bae113d2|
|733ae1b1103a3a338c8a0d6c7cd856a9|5563832a9f47bb347a0cdf7bb4d4b94f|fd594c2ac25758431bb533fab1627310|
+--------------------------------+--------------------------------+--------------------------------+

As shown, (a) is what you tried and doesn't work, but is different than (b) and (c). Based on the documentation, (b) should be the same transformation that Exasol is doing.

(c) is included just for demonstration.

Zach King
  • 798
  • 1
  • 8
  • 21
  • Thanks for suggestions but MD5(CONCAT(BIN(y), BIN(z))) didnt solve the issue. The value does not match with Exasol Hash_Md5('y','z'). – Shakky007 Jul 17 '23 at 06:17