0

i use murmurhash to compute the hash value, but i got the results of murmurhash in pyspark and local python are different.

enter image description here

enter image description here

local python: the hash value of 54958 is 5309672324031917724

pyspark: the hash value of 54958 is -878367076

Gaurav
  • 194
  • 8
  • try using it with the `signed=False` attribute – samkart Aug 10 '22 at 08:33
  • The value returned by Python is presumably a 64-bit unsigned integer (because the 64-bit version of MurmurHash3 is used). The result from psyspark looks like it could be a signed 32-bit integer. It cannot hold an unsigned 64-bit integer, and overflow occurs as a result. You would want to figure out how to make sure the hash is stored and displayed as an unsigned 64-bit integer. – njuffa Aug 10 '22 at 08:34
  • @samkart i've tried this too, but it's not work – Gaurav Aug 10 '22 at 08:45
  • @njuffa thanks, change the udf value to long that it's ok, as IntegerType is 4 byte, LongType is 8 byte. – Gaurav Aug 10 '22 at 08:48

0 Answers0