Spread specific columns in dataframe without any aggregation?

Question

Here is my toy df:

{'id': {0: 1089577, 1: 1089577, 2: 1089577, 3: 1089577, 4: 1089577},
 'title': {0: 'Hungarian Goulash Stew',
  1: 'Hungarian Goulash Stew',
  2: 'Hungarian Goulash Stew',
  3: 'Hungarian Goulash Stew',
  4: 'Hungarian Goulash Stew'},
 'readyInMinutes': {0: 120, 1: 120, 2: 120, 3: 120, 4: 120},
 'nutrients.amount': {0: 323.18, 1: 15.14, 2: 4.43, 3: 38.95, 4: 34.64},
 'nutrients.name': {0: 'Calories',
  1: 'Fat',
  2: 'Saturated Fat',
  3: 'Carbohydrates',
  4: 'Net Carbohydrates'},
 'nutrients.percentOfDailyNeeds': {0: 16.16,
  1: 23.3,
  2: 27.69,
  3: 12.98,
  4: 12.6},
 'nutrients.title': {0: 'Calories',
  1: 'Fat',
  2: 'Saturated Fat',
  3: 'Carbohydrates',
  4: 'Net Carbohydrates'},
 'nutrients.unit': {0: 'kcal', 1: 'g', 2: 'g', 3: 'g', 4: 'g'}}

I would like to spread the nutrients.title to be the columns. Sp I will get Fat, Saturated Fat ... columns with their corresponding values, without any agg.

What is the function that can do this without any aggregation? Just "reshaping".

I want it to be:

How can I "spread" it like this?

What do you mean by `their corresponding values`? What is your expected output for the input data? — Quang Hoang, May 04 '21 at 16:01
@QuangHoang I have the nutrients.title column that has the nutrient names, I need to make them columns, to spread them. — SteveS, May 04 '21 at 16:05
I understand you want them (`title`) as columns. What about the index? What are the cells' values? — Quang Hoang, May 04 '21 at 16:05
It would help if you had an example of what you want your output df to look like. — dm2, May 04 '21 at 16:07
@dm2 I have edited the Q with desired outcome, I am trying to play with pivot and melt functions — SteveS, May 04 '21 at 16:10

Henry Ecker · Accepted Answer · 2021-05-04T21:09:29.550

Try a pivot_table:

# Rename Columns
df.columns = df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x)

# Create Pivot Table
df = df.pivot_table(
    index=['id', 'title', 'readyInMinutes'],
    columns=['.title'],
    values=['.amount',
            '.percentOfDailyNeeds',
            '.unit'],
    aggfunc='first'
).reset_index() \
    .swaplevel(0, 1, axis=1)

# Re-Order Columns So that nutrients.title are grouped
df = df.reindex(sorted(df.columns), axis=1)

# Reduce Levels by join
df.columns = df.columns.map(''.join)

print(df.to_string(index=False))

Output:

     id  readyInMinutes                  title  Calories.amount  Calories.percentOfDailyNeeds Calories.unit  Carbohydrates.amount  Carbohydrates.percentOfDailyNeeds Carbohydrates.unit  Fat.amount  Fat.percentOfDailyNeeds Fat.unit  Net Carbohydrates.amount  Net Carbohydrates.percentOfDailyNeeds Net Carbohydrates.unit  Saturated Fat.amount  Saturated Fat.percentOfDailyNeeds Saturated Fat.unit
1089577             120 Hungarian Goulash Stew           323.18                         16.16          kcal                 38.95                              12.98                  g       15.14                     23.3        g                     34.64                                   12.6                      g                  4.43                              27.69                  g

Steps with abridged outputs

Change the Column Names:

print(df.columns.values)
# ['id' 'title' 'readyInMinutes' 'nutrients.amount' 'nutrients.name'
#  'nutrients.percentOfDailyNeeds' 'nutrients.title' 'nutrients.unit']
print(df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x).values)
# ['id' 'title' 'readyInMinutes' '.amount' '.name' '.percentOfDailyNeeds'
#  '.title' '.unit']

Pivot on multiple value columns with a single header column to create multi-level column index:

print(df.pivot_table(
    index=['id', 'title', 'readyInMinutes'],
    columns=['.title'],
    values=['.amount',
            '.percentOfDailyNeeds',
            '.unit'],
    aggfunc='first'
).to_string())

                                               .amount
.title                                        Calories Carbohydrates    Fat Net Carbohydrates Saturated Fat
id      title                  readyInMinutes
1089577 Hungarian Goulash Stew 120              323.18         38.95  15.14             34.64          4.43

Fix index and swaplevels so that label comes on top (Calories, Carbohydrates, etc.) .reset_index().swaplevel(0, 1, axis=1)

.title                                                 Calories Carbohydrates     Fat Net Carbohydrates Saturated Fat
             id                   title readyInMinutes  .amount       .amount .amount           .amount       .amount
0       1089577  Hungarian Goulash Stew            120   323.18         38.95   15.14             34.64          4.43

Sort Columns so labels are together:

df = df.reindex(sorted(df.columns), axis=1)

.title                                                 Calories                            Carbohydrates
             id readyInMinutes                   title  .amount .percentOfDailyNeeds .unit       .amount .percentOfDailyNeeds .unit
0       1089577            120  Hungarian Goulash Stew   323.18                16.16  kcal         38.95                12.98     g

Reduce levels using join (Create Calories.amount, Calories.unit, etc)

df.columns = df.columns.map(''.join)

        id  readyInMinutes                   title  Calories.amount  Calories.percentOfDailyNeeds Calories.unit
0  1089577             120  Hungarian Goulash Stew           323.18                         16.16          kcal

But I need the units and daily needs columns also, should I merge on id? — SteveS, May 04 '21 at 19:53
Done, please have a look on my excel pic of how I want it to be, I hope now it's much more clear. — SteveS, May 04 '21 at 20:03
I want ALL the nutrients names to be in columns, along with their unit and daily percentage. For each nutrient you should get 3 columns in the final dataframe. — SteveS, May 04 '21 at 20:16
Henry? Did you get the idea? Maybe we can have a quick zoom? — SteveS, May 04 '21 at 20:39
Yep :) exactly, I was trying to do the same with 3 pivot tables and joining. Can you please explain your solution if possible? — SteveS, May 04 '21 at 20:56

score 1 · Answer 2 · answered May 04 '21 at 17:11

You can use df.pivot() as follows:

(df.pivot(index=['id', 'title', 'readyInMinutes'], 
          columns='nutrients.title', 
          values='nutrients.amount')
          .rename_axis(None, axis=1)
).reset_index()

Result:

        id                   title  readyInMinutes  Calories  Carbohydrates    Fat  Net Carbohydrates  Saturated Fat
0  1089577  Hungarian Goulash Stew             120    323.18          38.95  15.14              34.64           4.43

Spread specific columns in dataframe without any aggregation?

2 Answers2