1

I have a table in this format, which I would like to transform with the "opposite" of melting. There is another question that addresses this but it doesn't work with so many other columns that I'd like to keep.

The original:

COUNTRY   STATE     CATEGORY   RESTAURANT         STARS     REVIEWS
US        Texas     NaN        Texas Chicken      4.1       1,157    
US        Texas     Spicy      Texas Chicken      4.1       1,157
US        Ohio      NaN        Mamas Shop         3.6       700
US        Ohio      NaN        Pizza Hut          4.5       855
US        Ohio      Pizza      Pizza Hut          4.5       855 

The desired output:

COUNTRY   STATE     RESTAURANT        STARS    REVIEWS  SPICY   PIZZA 
US        Texas     Texas Chicken     4.1      1,157    1       0 
US        Ohio      Mamas Shop        3.6      700      0       0
US        Ohio      Pizza Hut         4.5      855      0       1 

Essentially I would like to "group by" many columns, while creating extra columns based on the categories in the Category column. Restaurants that do not have any particular category would have the value 0 for all these additional columns. I also do not want any extra layers of columns as I am intending to write all of it to JSON.

Would appreciate any help on this and thank you in advance!

awks987
  • 45
  • 6

2 Answers2

0

A combination of set_index, crosstab and reindex can 'unmelt' the dataframe, and take care of the null values present in the dataframe:

#set aside required multiindex of country, state, restaurant, stars, and reviews
ind = df.set_index(['COUNTRY','STATE','RESTAURANT','STARS','REVIEWS']).index

#get frequency count for Pizza and Spicy
res = pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS],df.CATEGORY)

#reindex frequency dataframe with ind
res = res.reindex(ind,fill_value=0).drop_duplicates()
res


                CATEGORY                    Pizza   Spicy
COUNTRY STATE   RESTAURANT     STARS  REVIEWS       
 US     Texas   Texas Chicken   4.1    1,157    0   1
        Ohio    Mamas Shop      3.6    700      0   0
                Pizza Hut       4.5    855      1   0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0

I guess this should work:

pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS], df['CATEGORY'].fillna('_')).drop(columns='_')

JoergVanAken
  • 1,286
  • 9
  • 10