1

Hi Stackoverflow fams:

I am new to pyspark and trying to learn as much as I can. But for now, I want to convert GUID's into integers in pysprak. I can currently run the following statement in SQL to convert GUID's into an int.

CHECKSUM(HASHBYTES('sha2_512',GUID)) AS int_value_wanted

I wanted to do the same thing in pyspark and tried to create a temporary table out of spark dataframe and add the above statement in the sql query. But the code keeps throwing "Undefined function: 'CHECKSUM'". Is there a way I can add the "CHECKSUM" function into pyspark or do the same thing using another pyspark way?

from awsglue.context import GlueContext
from pyspark.sql import SQLContext

glueContext = GlueContext(SparkContext.getOrCreate())
spark_session = glueContext.spark_session
sqlContext = SQLContext(spark_session.sparkContext, spark_session)

spark_df =  spark.createDataFrame(
    [("2540f487-7a29-400a-98a0-c03902e67f73", "1386172469"),
    ("0b32389a-ce01-4e6a-855c-15940cc91e9e", "-2013240275")],
    ("GUDI","int_value_wanted")
)

spark_df.show(truncate=False)
spark_df.registerTempTable('temp')
new_df = sqlContext.sql("SELECT .*, CHECKSUM(HASHBYTES('sha2_512', GUDI)) AS detail_id FROM temp")
new_df.show(truncate=False)
+------------------------------------+----------------+
|GUDI                                |int_value_wanted|
+------------------------------------+----------------+
|2540f487-7a29-400a-98a0-c03902e67f73|1386172469      |
|0b32389a-ce01-4e6a-855c-15940cc91e9e|-2013240275     |
+------------------------------------+----------------+

Thanks

Sisay
  • 29
  • 5

1 Answers1

0

There is a sha2 built-in function, which returns the checksum for the SHA-2 family as a hex string. SHA-512 is also supported.

Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
  • Thank you for your prompt response... that is actually to convert a string to binary. What I actually want is to convert those binaries back to an int or str. As an example ```SELECT sha2('Spark', 256); will give "529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b" and I want to convert this binary back to "Spark" as a str value ``` – Sisay May 14 '21 at 16:28