Sort Pandas dataframe and print highest n values

Question

I have a pandas data frame and I want to sort column('Bytes') in Descending order and print highest 10 values and its related "Client IP" column value. Suppose following is a part of my dataframe. I have many different methods and failed?

0       Bytes    Client Ip                
0       1000      192.168.10.2    
1       2000      192.168.10.12    
2       500       192.168.10.4     
3       159       192.168.10.56

Following prints only the raw which has the highest value.

print df['Bytes'].argmax()

score 44 · Answer 1 · answered Apr 19 '16 at 05:50

I think you can use nlargest (New in pandas version 0.17.0):

print df
   0  Bytes  Client             Ip
0  1      1    1000   192.168.10.2
1  0      0    2000  192.168.10.12
2  2      2     500   192.168.10.4
3  3      3     159  192.168.10.56

print df.nlargest(3, 'Client')
   0  Bytes  Client             Ip
1  0      0    2000  192.168.10.12
0  1      1    1000   192.168.10.2
2  2      2     500   192.168.10.4

score 28 · Answer 2 · edited Apr 19 '16 at 05:44

28

Note: sort is deprecated - use sort_values instead

To sort descending use ascending=False:

In [6]: df.sort('Bytes', ascending=False)
Out[6]:
   0  Bytes      Client Ip
1  1   2000  192.168.10.12
0  0   1000   192.168.10.2
2  2    500   192.168.10.4
3  3    159  192.168.10.56

To take the first 10 values use .head(10).

edited Apr 19 '16 at 05:44

Zero

11,593
9
52
70

answered Jun 06 '13 at 09:36

Andy Hayden

359,921
101
625
535

There are 10 columns in the dataframe. I want to print only the highest 10 values in "Bytes" column and and its related "Client IP" column value? – Nilani Algiriyage Jun 06 '13 at 09:46
So: `df.sort('Bytes', ascending=False).head(10)` ? or `df.sort('Bytes', ascending=False).head(10)[['Bytes, 'Client Ip']]` ... – Andy Hayden Jun 06 '13 at 09:50
these are the values that I get? They are not sorted? 9977 9977 9977 9977 9972 9968 9968 9968 99604 – Nilani Algiriyage Jun 06 '13 at 10:09
I found this in some place, how can i adjust it...DataFrame({'count' : bt.groupby( [ "Bytes", "Client IP"] ).size()}).reset_index().head(10) @Andy Hayden ok Thanks. – Nilani Algiriyage Jun 06 '13 at 10:10
@NilaniAlgiriyage they are not integers :) (it is sorting the strings lexigraphically). You need to force the Bytes column to integers, e.g. df['Bytes'] = df['Bytes'].astype(int). – Andy Hayden Jun 06 '13 at 10:11
I have used a smaller version of the data file and following is the output. Why it not sorted?print df.sort('Bytes Returned', ascending=False).head(10)[['Bytes Returned', 'Client-IP']] ` Bytes Client-IP 3 9848 61.245.172.12 4 4985 61.245.172.14 0 4482 61.245.172.14 8 35535 61.245.172.1 1 3094 61.245.172.14 6 2282 61.245.172.14 2 20037 61.245.172.14 5 1498 61.245.172.12 7 1048 61.245.172.14 ` – Nilani Algiriyage Jun 06 '13 at 13:11
1

It still looks like it's sorting lexicographically, did you force the Bytes column to integers? – Andy Hayden Jun 06 '13 at 13:18
@NilaniAlgiriyage you can mark this answer as correct and upvote if it helped :) Happy panda-ing! – Andy Hayden Jun 06 '13 at 13:33

score 3 · Answer 3 · answered Jun 06 '13 at 13:35

3

df['Bytes'] = df['Bytes'].astype('int')
print df.sort('Bytes', ascending=False).head(10)[['Bytes', 'Client-IP']]

I could solve it using above code with the help of Andy Hayden. :D

answered Jun 06 '13 at 13:35

Nilani Algiriyage

32,876
32
87
121

This worked because your Bytes column was (incorrectly) stored as strings, something isn't quite right in the way you are reading into your DataFrame. This part shouldn't be necessary. – Andy Hayden Jun 06 '13 at 14:16

score 3 · Answer 4 · answered Sep 14 '18 at 10:02

3

df[['Bytes', 'Client Ip']].sort_values('Bytes', ascending=False).nlargest(10, 'Bytes')

This should get you everything you need 1) Sorting Bytes 2) Returning the Largest 10 Bytes values

answered Sep 14 '18 at 10:02

megamind

121
1
8

Sort Pandas dataframe and print highest n values

4 Answers4