1

I have a pandas data frame as follows:

id     group    type    action    cost
101    A        1                 10
101    A        1       repair    3
102    B        1                 5
102    B        1       repair    7
102    B        1       grease    2
102    B        1       inflate   1
103    A        2                 12
104    B        2                 9

I need to reshape it from long to wide, but depending on the value of the action column, as follows:

id     group    type    action_std    action_extra
101    A        1       10            3
102    B        1       5             10
103    A        2       12            0
104    B        2       9             0

In other words, for the rows with empty action field the cost value should be put under the action_std column, while for the rows with non-empty action field the cost value should be summarized under the action_extra column.

I've attempted with several combinations of groupby / agg / pivot but I cannot find any fully working solution...

lucatrv
  • 725
  • 8
  • 14

3 Answers3

2

I would suggest you simply split the cost column into a cost, and a cost_extra column. Something like the following:

import numpy as np

result = df.assign(
    cost_extra=lambda df: np.where(
        df['action'].notnull(), df['cost'], np.nan
    )
).assign(
    cost=lambda df: np.where(
        df['action'].isnull(), df['cost'], np.nan
    )
).groupby(
    ["id", "group", "type"]
)["cost", "cost_extra"].agg(
    "sum"
)

result looks like:

                cost  cost_extra
id  group type                  
101 A     1     10.0         3.0
102 B     1      5.0        10.0
103 A     2     12.0         0.0
104 B     2      9.0         0.0
PMende
  • 5,171
  • 2
  • 19
  • 26
  • 2
    You can [`assign`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html) two columns at once with comma separation. – Parfait Aug 31 '19 at 23:04
1

Check groupby with unstack

df.cost.groupby([df.id,df.group,df.type,df.action.eq('')]).sum().unstack(fill_value=0)
action          False  True 
id  group type              
101 A     1         3     10
102 B     1        10      5
103 A     2         0     12
104 B     2         0      9
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Thanks for your hints, this is the solution that I finally like the most (also for its simplicity):

df["action_std"] = df["cost"].where(df["action"] == "")
df["action_extra"] = df["cost"].where(df["action"] != "")
df = df.groupby(["id", "group", "type"])["action_std", "action_extra"].sum().reset_index()
lucatrv
  • 725
  • 8
  • 14