(PySpark) Update a delta table based on conditional expression while iterating over a lookup df and extract values to insert from a nested dict?

Asked Dec 17 '22 at 09:08

Active Dec 19 '22 at 07:10

Viewed 136 times

I have a mapping/lookup table/DF according to which I have to extract values from a highly nested json/dictionary. These values have to be inserted as column values to a delta table. How do I do this leveraging pyspark's parallelism?

I know I can collect() the mapping dataframe, open the json file and update each column of a row of a temp df and append to delta table but that will not run in parrallel.

Alternatively, I broadcast the dict/JSON, iterate over mapping dataframe using foreach() and according to when condition I upsert my delta table. But column.when() does not allow me to update a delta table nor does the delta.tables.merge() allow me to compare a dataframe and a dict.

edited Dec 19 '22 at 07:10

John Harrington

1,314
12
36

asked Dec 17 '22 at 09:08

Takreem

(PySpark) Update a delta table based on conditional expression while iterating over a lookup df and extract values to insert from a nested dict?

0 Answers0