Sort/reindex a pandas multiindex dataframe when level 1 indicies are different for each level 0 index. Only one level 0 group needs to be sorted

Question

I am struggling with finding a way to sort the rows in a grouped pandas dataframe with two levels of indicies. This is an example of what a dataframe df I have:

I could not paste the table in a nice format, but here is code for creating the same:

import pandas as pd
data = {'Count': [207, 105, 28, 37, 182, 194]}
index = pd.MultiIndex.from_tuples([
    ('age_group', '20-39'),
    ('age_group', '40-59'),
    ('age_group', '<19'),
    ('age_group', '>60'),
    ('gender', 'female'),
    ('gender', 'male')
], names=['Parameter', 'Values'])
df = pd.DataFrame(data, index=index)

I would like to sort the level 1 age group based on this list: age_groups = ["<19", "20-39", "40-59", ">60"] in order to place the row with "<19" before "20-39".

I have a lot more parameters and values than shown here, but even with this simple example I cannot find a good solution.

df.reindex(age_groups, level=1) will remove the other Parameters. pd.MultiIndex.from_product() seems to require me to hard coded specify all other indicies which I will not do. All examples and documentation I find have the same level 1 indicies for all level 0.

Can you please [edit] the question and put the dataframe in text format (so we can copy-paste it)? — Andrej Kesely, Jul 17 '23 at 16:19

score 2 · Answer 1 · answered Jul 17 '23 at 18:01

I'd probably go with @Andrej's approach if the categories are fixed.

If you need a programmatic solution, here is one with numpy.lexsort, it uses the last number of the range to perform the sort:

order = np.lexsort([df.index.get_level_values(1)
                      .str.extract(r'(\d+)$', expand=False)
                      .astype(float),
                    df.index.get_level_values(0)
                    ])

out = df.iloc[order]

Output:

                  Count
Parameter Values       
age_group <19        28
          20-39     207
          40-59     105
          >60        37
gender    female    182
          male      194

score 1 · Answer 2 · answered Jul 17 '23 at 16:43

1

Kind of clunky, but I used a custom key with df.sort_index().

def sorter(vals):
    ages = ['<19', '20-39', '40-59', '>60']
    genders = ['female', 'male']
    if vals.name == 'Parameter':
        return sorted(vals)
    else:
        return [ages.index(val) if val in ages else genders.index(val) for val in vals]

df.sort_index(key = sorter)

answered Jul 17 '23 at 16:43

Michael Cao

2,278
1
1
13

Using list.index repeatedly is unfortunately inefficient. – mozway Jul 17 '23 at 18:01
We can speed that up by building a dictionary instead for faster lookup time. – Michael Cao Jul 17 '23 at 19:35

score 1 · Answer 3 · answered Jul 17 '23 at 16:52

1

Another option is to simply select rows in correct order:

weights = ["<19", "20-39", "40-59", ">60"]

df = pd.concat([df.loc[("age_group", weights), :], df.loc[("gender",):]])
print(df)

Prints:

                  Count
Parameter Values       
age_group <19        28
          20-39     207
          40-59     105
          >60        37
gender    female    182
          male      194

answered Jul 17 '23 at 16:52

Andrej Kesely

168,389
15
48
91

This is a pretty nice solutions, however is comes with the warning: FutureWarning: The behavior of indexing on a MultiIndex with a nested sequence of labels is deprecated and will change in a future version. `series.loc[label, sequence]` will raise if any members of 'sequence' or not present in the index's second level. – confusedstudent Jul 18 '23 at 12:55
Which pandas version do you have? I'm using `pandas==1.5.3` (maybe it changed in pandas 2) – Andrej Kesely Jul 18 '23 at 13:55

Sort/reindex a pandas multiindex dataframe when level 1 indicies are different for each level 0 index. Only one level 0 group needs to be sorted

3 Answers3