Sort within categories in a specific transformation using Pandas

Question

I have a dataset where I would like to sort my data within categories in a specific way.

Data

Location    range     type  Q1 27   Q2 27   Q3 27   Q4 27
NY          low re    AA    5       0       7       0
NY          low re    BB    0       0       0       0
NY          low re    DD    0.51    1       2       2.05
NY          low re    SS    0       0       0       0
NY          low stat  AA    1       0.86    1.5     5.27
NY          low stat  BB    0.45    0.49    0.91    2.17
NY          low stat  DD    1.53    1.27    2.22    7.78
NY          low stat  SS    0.26    0.21    0.37    1.3
CA          med stat  AA    0.38    3       0       0
CA          med stat  BB    0.33    1.22    0       0.11
CA          med stat  DD    0.55    5.4     0       0
CA          med stat  SS    0.09    0.91    0       0
CA          med re    AA    1.14    0       0       0
CA          med re    BB    0       0       0       2
CA          med re    DD    1.02    0       0       0.56
CA          med re    SS    0       0       0       0


import pandas as pd

data = {
    'Location': ['NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA'],
    'range': ['low re', 'low re', 'low re', 'low re', 'low stat', 'low stat', 'low stat', 'low stat', 'med stat', 'med stat', 'med stat', 'med stat', 'med re', 'med re', 'med re', 'med re'],
    'type': ['AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS'],
    'Q1 27': [5, 0, 0.51, 0, 1, 0.45, 1.53, 0.26, 0.38, 0.33, 0.55, 0.09, 1.14, 0, 1.02, 0],
    'Q2 27': [0, 0, 1, 0, 0.86, 0.49, 1.27, 0.21, 3, 1.22, 5.4, 0.91, 0, 0, 0, 0],
    'Q3 27': [7, 0, 2, 0, 1.5, 0.91, 2.22, 0.37, 0, 0, 0, 0, 0, 0, 0, 0],
    'Q4 27': [0, 0, 2.05, 0, 5.27, 2.17, 7.78, 1.3, 0, 0.11, 0, 0, 0, 2, 0.56, 0]
}

df = pd.DataFrame(data)
print(df)

Desired

All types are similar - ex AA belongs w AA BB belongs with BB categorized by Location. Order doesn't matter as long as the types are adjacent to each other categorized by Location.

Location    range     type  Q1 27   Q2 27   Q3 27   Q4 27
NY          low re    AA    5       0       7       0
NY          low stat  AA    1       0.86    1.5     5.27
NY          low re    DD    0.51    1       2       2.05
NY          low stat  DD    1.53    1.27    2.22    7.78
NY          low re    SS    0       0       0       0
NY          low stat  SS    0.26    0.21    0.37    1.3
NY          low re    BB    0       0       0       0
NY          low stat  BB    0.45    0.49    0.91    2.17
CA          med re    AA    1.14    0       0       0
CA          med stat  AA    0.38    3       0       0
CA          med re    DD    1.02    0       0       0.56
CA          med stat  DD    0.55    5.4     0       0
CA          med re    SS    0       0       0       0
CA          med stat  SS    0.09    0.91    0       0
CA          med re    BB    0       0       0       2
CA          med stat  BB    0.33    1.22    0       0.11

Doing

I am trying to implement

 df.sort_values('Location', 'range', inplace=True)

However, I am not obtaining my desired result. Any suggestion is appreciated.

All types are similar - ex AA belongs w AA BB belongs with BB categorized by Location @mozway — Lynn, May 01 '23 at 18:14
OK, but what determines the order? Is this always AA/DD/SS/BB? what about low/med re/stat? — mozway, May 01 '23 at 18:21

score 1 · Accepted Answer · answered May 01 '23 at 18:48

I think you need to pass a list of columns to sort by in 'sort_values()' plus sort by the 'type' column. Try this:

df.sort_values(['Location', 'type', 'range'], inplace=True)

This sorts the DataFrame firstly by 'Location' column, then by 'type' column, and finally by 'range' column. Let me know if this works.

score 1 · Answer 2 · answered May 01 '23 at 18:58

Your implementation is not very good. sort_values should be included as a parameter in by and the value for ascending or descending should be set. Additionally, because the type order is in the form of 'AA' to 'DD' to 'SS' to 'BB', the range follows a certain pattern. You can use pandas.Categorical to achieve your desired output.

You can try this code here to obtain the desire result:

import pandas as pd

data = {
    'Location': ['NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'NY', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA', 'CA'],
    'range': ['low re', 'low re', 'low re', 'low re', 'low stat', 'low stat', 'low stat', 'low stat', 'med stat', 'med stat', 'med stat', 'med stat', 'med re', 'med re', 'med re', 'med re'],
    'type': ['AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS', 'AA', 'BB', 'DD', 'SS'],
    'Q1 27': [5, 0, 0.51, 0, 1, 0.45, 1.53, 0.26, 0.38, 0.33, 0.55, 0.09, 1.14, 0, 1.02, 0],
    'Q2 27': [0, 0, 1, 0, 0.86, 0.49, 1.27, 0.21, 3, 1.22, 5.4, 0.91, 0, 0, 0, 0],
    'Q3 27': [7, 0, 2, 0, 1.5, 0.91, 2.22, 0.37, 0, 0, 0, 0, 0, 0, 0, 0],
    'Q4 27': [0, 0, 2.05, 0, 5.27, 2.17, 7.78, 1.3, 0, 0.11, 0, 0, 0, 2, 0.56, 0]
}

df = pd.DataFrame(data)
def range_order(val):
    if val.endswith('re'):
        return val + 'a'
    elif val.endswith('stat'):
        return val + 'b'
    else:
        return val

# apply the function to the range column
df['range'] = df['range'].apply(range_order)
range_order = df['range'].unique().tolist()
#Get the range_order from lowest to highest
range_order.sort()

# define the categorical ordering of the range column
df['range'] = pd.Categorical(df['range'], categories=range_order, ordered=True)

# define the categorical ordering of the type column
type_order = ['AA', 'DD', 'SS', 'BB']
df['type'] = pd.Categorical(df['type'], categories=type_order, ordered=True)

df = df.sort_values(by=['Location', 'type', 'range'], ascending=[False, True, True])

# remove the extra letter used in sorting range values
df['range'] = df['range'].apply(lambda x: x[:-1])
print(df)

The output look like:

   Location     range type  Q1 27  Q2 27  Q3 27  Q4 27
0        NY    low re   AA   5.00   0.00   7.00   0.00
4        NY  low stat   AA   1.00   0.86   1.50   5.27
2        NY    low re   DD   0.51   1.00   2.00   2.05
6        NY  low stat   DD   1.53   1.27   2.22   7.78
3        NY    low re   SS   0.00   0.00   0.00   0.00
7        NY  low stat   SS   0.26   0.21   0.37   1.30
1        NY    low re   BB   0.00   0.00   0.00   0.00
5        NY  low stat   BB   0.45   0.49   0.91   2.17
12       CA    med re   AA   1.14   0.00   0.00   0.00
8        CA  med stat   AA   0.38   3.00   0.00   0.00
14       CA    med re   DD   1.02   0.00   0.00   0.56
10       CA  med stat   DD   0.55   5.40   0.00   0.00
15       CA    med re   SS   0.00   0.00   0.00   0.00
11       CA  med stat   SS   0.09   0.91   0.00   0.00
13       CA    med re   BB   0.00   0.00   0.00   2.00
9        CA  med stat   BB   0.33   1.22   0.00   0.11

Sort within categories in a specific transformation using Pandas

2 Answers2