I am trying to use 2 pyspark dataframe columns as an input to a nested dictionary to get the output as a new pyspark column. Also would want the solution to scale to a nested dictionary with 4-5 levels.
The dictionary is of the form: dict_prob={"a":{"x1":"y1","x2:y2"},"b":{"m1":"n1","m2":"n2"}}
Input Columns are:
index | col1 | col2 |
---|---|---|
0 | a | x1 |
1 | a | x2 |
2 | b | m2 |
Output Column Needed
col3 |
---|
y1 |
y2 |
n2 |
I tried the below links but these seem to work for a single dictionary and not for a nested dictionary. PySpark create new column with mapping from a dict How to use a column value as key to a dictionary in PySpark?