0

I have a Pandas dataframe of two columns:

  • one column consists of integer values
  • the other of lists of different sizes as values.

I want to sort the frame records according to max int value and max list size in descending order. I tried to paste the data, but it was not understandable.

Thanks

zero323
  • 322,348
  • 103
  • 959
  • 935
Saif
  • 95
  • 8
  • 1
    [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/6910411) – zero323 Apr 12 '18 at 10:25

2 Answers2

0

Im using this as a test dataset:

df = pd.DataFrame({'a': [5,2,5], 'b': [[1,4,6,7], [2,6], [1,7,4]]})

   a             b
0  5  [1, 4, 6, 7]
1  2        [2, 6]
2  5     [1, 7, 4]

One way you can do this is to calculate the length of your lists, and then sort your dataframe by both the lengths and your integer column:

df['lens'] = df['b'].str.len()

df.sort_values(['a', 'lens'], ascending=False, inplace=True)

df = df.drop(columns='lens').reset_index(drop=True)

Which will give you this:

   a             b
0  5  [1, 4, 6, 7]
1  5     [1, 7, 4]
2  2        [2, 6]
Simon
  • 9,762
  • 15
  • 62
  • 119
  • Thank You Simon, I need the lists with bigger sizes and support of larger than zero to be first. Here is a ample of the data support. the int value is the support value: the value of colomn1. itemsets 0 0.576923 [first_name] 1 0.423077 [hire_date] 2 0.384615 [first_name, hire_date] 3 0.384615 [first_name, last_name] 4 0.384615 [emp_no] – Saif Apr 12 '18 at 11:25
  • I dont understand. In my example the largest list is sorted first, and so is the largest int value. How should it look instead? – Simon Apr 12 '18 at 12:16
0

One way is to use numpy.lexsort:

import pandas as pd, numpy as np

df = pd.DataFrame({'a': [5,2,5], 'b': [[1,4,6,7], [2,6], [1,7,4]]})

df = df.loc[np.lexsort((df['b'].map(len), df['a']))[::-1]]

print(df)

#    a             b
# 0  5  [1, 4, 6, 7]
# 2  5     [1, 7, 4]
# 1  2        [2, 6]

This is likely to perform better for larger dataframes.

Note, using numpy.lexsort, the sequence of ordering is reversed, i.e. the above code first sorts by a, then length of lists in b.

jpp
  • 159,742
  • 34
  • 281
  • 339