I have multiindex mapping rules, here's the rules
Type A: Chicken, Beef, Goat
Type B: Fish, Shrimp
Type C: Chicken, Pork
I here's my dataframe, let say this is a df
dataframe, and want to do multi index mapping
id Menu
1 Fried Chicken
2 Shrimp Chips
3 Pork with Cheese
4 Fish Spaghetti
5 Goat Sate
6 Beef Soup
With pandas
, it can be use pd.MultiIndex
and pd.Series.str.get_dummies
. So, the code will be
from pandas import pd
from numpy.core.defchararray import find
m = {
'Type A': ['Chicken', 'Beef', 'Goat'],
'Type B': ['Fish', 'Shrimp'],
'Type C': ['Chicken', 'Pork']
}
mux = pd.MultiIndex.from_tuples(
[(k, v) for k, values in m.items() for v in values])
df.join(
df.Menu.str.get_dummies(sep=' ') \
.reindex(columns=mux, level=1).max(axis=1
, level=0)
)
The output would be like this
id Menu Type A Type B Type C
1 Fried Chicken 1 0 1
2 Shrimp Chips 0 1 0
3 Pork with Cheese 0 0 1
4 Fish Spaghetti 0 1 0
5 Goat Sate 1 0 0
6 Beef Soup 1 0 0
How to do this on pySpark datafarame