I have a problem in Pyspark creating a column based on values in another column for a new dataframe.
It's boring and seems to me not a good practice to use a lot of
CASE
WHEN column_a = 'value_1' THEN 'value_x'
WHEN column_a = 'value_2' THEN 'value_y'
...
WHEN column_a = 'value_289' THEN 'value_xwerwz'
END
In cases like this, in python, I get used to using a dict or, even better, a configparser file and avoid the if else condition. I just pass the key and python returns the desired value. Also, we have a 'fallback' option for ELSE clause.
The problem seems to me that we are not treating a single row but all of them in one command, so using dict/map/configparser is an unavailable option. I thought about using a loop with dict, but it seems too slow and a waste of computation as we repeat all the conditions.
I'm still looking for this practice, if I find it, I'll post it here. But, you know, probably a lot of people already use it and I don't know yet. But if there is no other way, ok. Use many WHEN THEN conditions won't be a choice.
Thank you
I tried to use a dict and searched for solutions like this