Create conditional multi-index subtotal on pivoted dataframe

Question

I have a multi-index pivot table looks like the following, where Date is the index, and columns are [country, attributes]:

net_supply = pivot_filter.pivot_table(values='Production', index='Date', columns=['country','Attributes'],aggfunc='sum',fill_value=0)

            country1       country2
            prod runs imps prod runs imps
2018-01-01    10   -1    3 5   -6    3
2018-02-01    11   -11   2 13   -2    4
2018-03-01    8    -7    1 1   -5    2
2018-04-01    4    -7    5 10   -7    5
2018-05-01    8    -10   4 11   -2    5

I would like to add a conditional subtotal sum total for each country that checks:


if prod + runs >= 0, then add prod + runs + imps if true, else only add prod + runs.

This would be for each of the countries by date index. My ideal output would only include that new summed value like below:

            country1  country2
            subtotal  subtotal 
2018-01-01    12       -1
2018-02-01    2        15
2018-03-01    2        -4
2018-04-01    -3        8
2018-05-01    -2       14

Note that some countries only have a few of the 3 attributes (prod, or runs or any combination). I know margins can do a straight sum but not with conditions. Any help would be great!

score 1 · Accepted Answer · answered Apr 17 '23 at 19:39

1

First, get the result of "prod + runs", then get the result of all 3 columns, then finally use np.where to set the correct result based on the condition.

sum_df = df.drop('imps', axis=1, level=1).groupby(level=0, axis=1).sum()
sum_df[:] = np.where(sum_df >= 0, df.groupby(level=0, axis=1).sum(), sum_df)

sum_df

            country1  country2
2018-01-01        12        -1
2018-02-01         2        15
2018-03-01         2        -4
2018-04-01        -3         8
2018-05-01        -2        14

answered Apr 17 '23 at 19:39

cs95

379,657
97
704
746

Thanks a bunch, this worked. One followup, why have sum_df[ : ] rather than just sum_df? – carevans88 Apr 18 '23 at 17:13
@carevans88 because np.where returned a 2D numpy array, so to update the existing `sum_df` I did the inplace slicing assignment back using `sum_df[:]` – cs95 Apr 18 '23 at 17:18

score 0 · Answer 2 · answered Apr 17 '23 at 19:50

Another version, more explicit but maybe more comprehensible for someone:

prod = df.loc[:, (slice(None), "prod")].values
runs = df.loc[:, (slice(None), "runs")].values
imps = df.loc[:, (slice(None), "imps")].values

for lvl0 in df.columns.levels[0]:
    df[(lvl0, "subtotal")] = 0

df = df.sort_index(axis=1)

df.loc[:, (slice(None), "subtotal")] = np.where(
    prod + runs >= 0,
    prod + runs + imps,
    prod + runs,
)
print(df)

Prints:

           country1                    country2                   
               imps prod runs subtotal     imps prod runs subtotal
2018-01-01        3   10   -1       12        3    5   -6       -1
2018-02-01        2   11  -11        2        4   13   -2       15
2018-03-01        1    8   -7        2        2    1   -5       -4
2018-04-01        5    4   -7       -3        5   10   -7        8
2018-05-01        4    8  -10       -2        5   11   -2       14

If you want only subtotal columns then you can do afterwards:

print(df.loc[:, (slice(None), 'subtotal')])

Prints:

           country1 country2
           subtotal subtotal
2018-01-01       12       -1
2018-02-01        2       15
2018-03-01        2       -4
2018-04-01       -3        8
2018-05-01       -2       14

Thanks for the quick replace, this doesnt seem to work throws the error: ValueError: operands could not be broadcast together with shapes (180,211) (180,108) — carevans88, Apr 17 '23 at 20:06
@carevans88 Probably some countries don't have `prod`/`runs`/`imps` column in level 1? Maybe try to reindex your dataframe first. — Andrej Kesely, Apr 17 '23 at 20:13

Create conditional multi-index subtotal on pivoted dataframe

2 Answers2