Background I use explode to transpose columns to rows. This works very well in general with good performance. The source dataframe (df_audit in below code) is dynamic so can contain different structure.
Problem Recently have incoming dataframe with very large number of columns (5 thousand) - the below code runs successfully but is very slow to run the line starting 'exploded'. Anyone faced similar problems? I could split up the dataframe to multiple dataframes (broken out by columns) or might there be better way? Or example code?
Example code
key_cols = ["cola", "colb", "colc"]
cols = [col for col in df_audit.columns if col not in key_cols]
exploded = explode(array([struct(lit(c).alias("key"), col(c).alias("val")) for c in cols])).alias("exploded")
df_audit = df_audit.select(key_cols + [exploded]).select(key_cols + ["exploded.key", "exploded.val"])