0

I have a sample dataframe like this. Basically I want to do ranking based on item_number and location_id. I could've done something like that in SQL using window function (dense_rank, over() partition by).

df = pd.DataFrame({'item_number': [1029980, 1029980, 1029980, 1029980, 1029980, 
                                   1029980, 1029980, 1029980, 1029980, 1029980],
                   'location_id': ['L3-25-AA-05-B', 'L3-25-AA-05-B', 'L3-25-AA-05-B', 'L3-25-AA-05-B', 'L3-25-AA-05-B', 
                                   'L4-25-AA-05-B', 'L4-25-AA-05-B','L4-25-AA-05-B', 'L4-25-AA-05-B', 'L4-25-AA-05-B'],
                   'Date': ['2021-10-01', '2021-10-02', '2021-10-03', '2021-10-04', '2021-10-05', 
                            '2021-10-01', '2021-10-02', '2021-10-03', '2021-10-04', '2021-10-05']})
item_number location_id Date
1029980 L3-25-AA-05-B 2021-10-01
1029980 L3-25-AA-05-B 2021-10-02
1029980 L3-25-AA-05-B 2021-10-03
1029980 L3-25-AA-05-B 2021-10-04
1029980 L3-25-AA-05-B 2021-10-05
1029980 L4-25-AA-05-B 2021-10-01
1029980 L4-25-AA-05-B 2021-10-02
1029980 L4-25-AA-05-B 2021-10-03
1029980 L4-25-AA-05-B 2021-10-04
1029980 L4-25-AA-05-B 2021-10-05

I want the data to be like this. The ranking is grouped by item_number and location_id. If the item_number and location_id are same then it's considered in a same group and ranking should be done based on the Date.

item_number location_id Date Rank
1029980 L3-25-AA-05-B 2021-10-01 5
1029980 L3-25-AA-05-B 2021-10-02 4
1029980 L3-25-AA-05-B 2021-10-03 3
1029980 L3-25-AA-05-B 2021-10-04 2
1029980 L3-25-AA-05-B 2021-10-05 1
1029980 L4-25-AA-05-B 2021-10-01 5
1029980 L4-25-AA-05-B 2021-10-02 4
1029980 L4-25-AA-05-B 2021-10-03 3
1029980 L4-25-AA-05-B 2021-10-04 2
1029980 L4-25-AA-05-B 2021-10-05 1

I have tried this code but it's giving an error as the columns are all string.

test['rank'] = test.groupby(['item_number','location_id']).rank()

The above code gave me this error.

DataError: No numeric types to aggregate

Can anyone help me in this regard please?

hkay
  • 159
  • 1
  • 2
  • 12
  • Will you please add the result of `print(df.head().to_dict())`? It works for me. –  Dec 09 '21 at 23:56
  • Note that you probably want to use `.rank(ascending=False)` instead of `.rank()` (per your expected output) –  Dec 09 '21 at 23:57
  • {'item_number': {3: 1029980}, 'location_id': {3: 'L3-25-AA-05-B'}, 'Date': {3: '2021-10-05'}} here is the result of print(df.head().to_dict()) – hkay Dec 10 '21 at 00:36
  • Thank you for editing. I have added your suggested version – hkay Dec 10 '21 at 00:52

3 Answers3

1

In your case do

df['new'] = df.groupby(['item_number','location_id'])['Date'].rank(ascending=False)
0    5.0
1    4.0
2    3.0
3    2.0
4    1.0
5    5.0
6    4.0
7    3.0
8    2.0
9    1.0
Name: Date, dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Yeah, this one worked. Also, I found few duplicates in my data that's why I could not use groupby earlier. Thanks @BENY – hkay Dec 10 '21 at 01:10
0

You could use:

test.dtypes

to see what type the columns are that you are using (if they are numeric of not) and then potentially use astype: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

test.astype({"item_number": "int"}).groupby(['item_number','location_id']).rank()

Though I'm not sure if that would work for location_id.

  • You're right. It would work but the problem is location_id won't change. And because of that I cannot use .rank(). – hkay Dec 10 '21 at 00:35
  • Have you tried using sort_values: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html – thedavidhackett Dec 10 '21 at 00:39
  • I would have used sort_values only but I have to partition it by item_number and then by location_id as well. That's where I am stuck – hkay Dec 10 '21 at 00:46
0

IIUC, you can reverse the dataframe, groupby on item_number and location_id and cumcount Dates:

df['rank'] = df.groupby(['item_number','location_id'], as_index=False)['Date'].cumcount(ascending=False)+1

Output:

   item_number    location_id        Date  rank
0      1029980  L3-25-AA-05-B  2021-10-01     5
1      1029980  L3-25-AA-05-B  2021-10-02     4
2      1029980  L3-25-AA-05-B  2021-10-03     3
3      1029980  L3-25-AA-05-B  2021-10-04     2
4      1029980  L3-25-AA-05-B  2021-10-05     1
5      1029980  L4-25-AA-05-B  2021-10-01     5
6      1029980  L4-25-AA-05-B  2021-10-02     4
7      1029980  L4-25-AA-05-B  2021-10-03     3
8      1029980  L4-25-AA-05-B  2021-10-04     2
9      1029980  L4-25-AA-05-B  2021-10-05     1