I am trying to apply pyspark sql functions hash algorithm for every row in two dataframes to identify the differences. Hash algorithm is based on strings so I am trying to convert any datatype other than string to string. I am facing most of the issues in date columns conversion since date format need to be changed before converting into string to make it consistent for hash based matching.Please help me with the approach.
#Identify the fields which are not strings
from pyspark.sql.types import *
fields = df_db1.schema.fields
nonStringFields = map(lambda f: col(f.name), filter(lambda f: not isinstance(f.dataType, StringType), fields))
#Convert the date fields to specific date format and convert to string.
DateFields = map(lambda f: col(f.name), filter(lambda f: isistance(f.dataType, DateType), fields))
#convert all other fields other than string to string.