0

I have a dataframe that looks like the following. Categories can be nested to infinite levels using a colon separated string. I wish to sort it by descending amount. But in hierarchical type fashion like shown.

My Dataframe

[category]                          [amount]
Household                           1000
Household : Utilities               600
Living : Food                       100
Transport                           5000
Household : Rent                    400
Household : Utilities : Water       400
Transport : Car                     4900
Transport : Train                   100
Living                              250
Household : Utilities : Electric    200
Living : Other                      150

How I need it sorted

Transport                           5000
Transport : Car                     4900
Transport : Train                   100
Household                           1000
Household : Utilities               600
Household : Utilities : Water       400
Household : Utilities : Electric    200
Household : Rent                    400
Living                              250
Living : Other                      150
Living : Food                       100

Notice how its sorted by amount. But still confined to the hierarchy. (Note: The amounts of each colon separated sub level sum up to the amounts in the root level)

I've played around with things like this. But Its not quite what I'm after. Its hurting my head a bit too much. Anyone know a way to do this nicely with pandas?

dfs = dfs.sort_values(['amount', 'category'], ascending=[True, True])

Edit 1:

if I add

Household : Cleaning                100
Household : Cleaning : Bathroom     75
Household : Cleaning : Kitchen      25

I would want to the sort order to be

Transport                           5000
Transport : Car                     4900
Transport : Train                   100
Household                           1100
Household : Utilities               600
Household : Utilities : Water       400
Household : Utilities : Electric    200
Household : Cleaning                100
Household : Cleaning : Bathroom     75
Household : Cleaning : Kitchen      25
Household : Rent                    400
Living                              250
Living : Other                      150
Living : Food                       100
Pierce
  • 101
  • 7

1 Answers1

0

Idea is sorting by length of splitted lists, groups created by select top level by indexing str[0] with Series.map and column [amount]:

splitted = df['[category]'].str.split(' : ')
df['cat'] = splitted.str.len()
df['am'] = splitted.str[0].map(df.set_index('[category]')['[amount]'])
df = df.sort_values(['am','cat', '[amount]'], ascending=[False, True, False])
df = df.drop(['am','cat'], axis=1)
print (df)
                          [category]  [amount]
3                          Transport      5000
7                  Transport : Train      2500
6                    Transport : Car      2000
0                          Household      1000
1              Household : Utilities       500
4                   Household : Rent       500
5      Household : Utilities : Water       300
9   Household : Utilities : Electric       200
8                             Living       250
10                    Living : Other       150
2                      Living : Food       100
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252