1

Here is my toy df:

{'id': {0: 1089577, 1: 1089577, 2: 1089577, 3: 1089577, 4: 1089577},
 'title': {0: 'Hungarian Goulash Stew',
  1: 'Hungarian Goulash Stew',
  2: 'Hungarian Goulash Stew',
  3: 'Hungarian Goulash Stew',
  4: 'Hungarian Goulash Stew'},
 'readyInMinutes': {0: 120, 1: 120, 2: 120, 3: 120, 4: 120},
 'nutrients.amount': {0: 323.18, 1: 15.14, 2: 4.43, 3: 38.95, 4: 34.64},
 'nutrients.name': {0: 'Calories',
  1: 'Fat',
  2: 'Saturated Fat',
  3: 'Carbohydrates',
  4: 'Net Carbohydrates'},
 'nutrients.percentOfDailyNeeds': {0: 16.16,
  1: 23.3,
  2: 27.69,
  3: 12.98,
  4: 12.6},
 'nutrients.title': {0: 'Calories',
  1: 'Fat',
  2: 'Saturated Fat',
  3: 'Carbohydrates',
  4: 'Net Carbohydrates'},
 'nutrients.unit': {0: 'kcal', 1: 'g', 2: 'g', 3: 'g', 4: 'g'}}

I would like to spread the nutrients.title to be the columns. Sp I will get Fat, Saturated Fat ... columns with their corresponding values, without any agg.

What is the function that can do this without any aggregation? Just "reshaping".

I want it to be: enter image description here

How can I "spread" it like this?

SteveS
  • 3,789
  • 5
  • 30
  • 64

2 Answers2

2

Try a pivot_table:

# Rename Columns
df.columns = df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x)

# Create Pivot Table
df = df.pivot_table(
    index=['id', 'title', 'readyInMinutes'],
    columns=['.title'],
    values=['.amount',
            '.percentOfDailyNeeds',
            '.unit'],
    aggfunc='first'
).reset_index() \
    .swaplevel(0, 1, axis=1)

# Re-Order Columns So that nutrients.title are grouped
df = df.reindex(sorted(df.columns), axis=1)

# Reduce Levels by join
df.columns = df.columns.map(''.join)

print(df.to_string(index=False))

Output:

     id  readyInMinutes                  title  Calories.amount  Calories.percentOfDailyNeeds Calories.unit  Carbohydrates.amount  Carbohydrates.percentOfDailyNeeds Carbohydrates.unit  Fat.amount  Fat.percentOfDailyNeeds Fat.unit  Net Carbohydrates.amount  Net Carbohydrates.percentOfDailyNeeds Net Carbohydrates.unit  Saturated Fat.amount  Saturated Fat.percentOfDailyNeeds Saturated Fat.unit
1089577             120 Hungarian Goulash Stew           323.18                         16.16          kcal                 38.95                              12.98                  g       15.14                     23.3        g                     34.64                                   12.6                      g                  4.43                              27.69                  g

Steps with abridged outputs

  1. Change the Column Names:
print(df.columns.values)
# ['id' 'title' 'readyInMinutes' 'nutrients.amount' 'nutrients.name'
#  'nutrients.percentOfDailyNeeds' 'nutrients.title' 'nutrients.unit']
print(df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x).values)
# ['id' 'title' 'readyInMinutes' '.amount' '.name' '.percentOfDailyNeeds'
#  '.title' '.unit']
  1. Pivot on multiple value columns with a single header column to create multi-level column index:
print(df.pivot_table(
    index=['id', 'title', 'readyInMinutes'],
    columns=['.title'],
    values=['.amount',
            '.percentOfDailyNeeds',
            '.unit'],
    aggfunc='first'
).to_string())
                                               .amount
.title                                        Calories Carbohydrates    Fat Net Carbohydrates Saturated Fat
id      title                  readyInMinutes
1089577 Hungarian Goulash Stew 120              323.18         38.95  15.14             34.64          4.43
  1. Fix index and swaplevels so that label comes on top (Calories, Carbohydrates, etc.) .reset_index().swaplevel(0, 1, axis=1)
.title                                                 Calories Carbohydrates     Fat Net Carbohydrates Saturated Fat
             id                   title readyInMinutes  .amount       .amount .amount           .amount       .amount
0       1089577  Hungarian Goulash Stew            120   323.18         38.95   15.14             34.64          4.43
  1. Sort Columns so labels are together:
df = df.reindex(sorted(df.columns), axis=1)
.title                                                 Calories                            Carbohydrates
             id readyInMinutes                   title  .amount .percentOfDailyNeeds .unit       .amount .percentOfDailyNeeds .unit
0       1089577            120  Hungarian Goulash Stew   323.18                16.16  kcal         38.95                12.98     g 
  1. Reduce levels using join (Create Calories.amount, Calories.unit, etc)
df.columns = df.columns.map(''.join)
        id  readyInMinutes                   title  Calories.amount  Calories.percentOfDailyNeeds Calories.unit
0  1089577             120  Hungarian Goulash Stew           323.18                         16.16          kcal
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • But I need the units and daily needs columns also, should I merge on id? – SteveS May 04 '21 at 19:53
  • Done, please have a look on my excel pic of how I want it to be, I hope now it's much more clear. – SteveS May 04 '21 at 20:03
  • My solution is to make a few tables and join on ```id```. – SteveS May 04 '21 at 20:04
  • I want ALL the nutrients names to be in columns, along with their unit and daily percentage. For each nutrient you should get 3 columns in the final dataframe. – SteveS May 04 '21 at 20:16
  • Henry? Did you get the idea? Maybe we can have a quick zoom? – SteveS May 04 '21 at 20:39
  • Yep :) exactly, I was trying to do the same with 3 pivot tables and joining. Can you please explain your solution if possible? – SteveS May 04 '21 at 20:56
1

You can use df.pivot() as follows:

(df.pivot(index=['id', 'title', 'readyInMinutes'], 
          columns='nutrients.title', 
          values='nutrients.amount')
          .rename_axis(None, axis=1)
).reset_index()

Result:

        id                   title  readyInMinutes  Calories  Carbohydrates    Fat  Net Carbohydrates  Saturated Fat
0  1089577  Hungarian Goulash Stew             120    323.18          38.95  15.14              34.64           4.43
SeaBean
  • 22,547
  • 3
  • 13
  • 25