34

I have a pandas data frame and I want to sort column('Bytes') in Descending order and print highest 10 values and its related "Client IP" column value. Suppose following is a part of my dataframe. I have many different methods and failed?

0       Bytes    Client Ip                
0       1000      192.168.10.2    
1       2000      192.168.10.12    
2       500       192.168.10.4     
3       159       192.168.10.56 

Following prints only the raw which has the highest value.

print df['Bytes'].argmax()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121

4 Answers4

44

I think you can use nlargest (New in pandas version 0.17.0):

print df
   0  Bytes  Client             Ip
0  1      1    1000   192.168.10.2
1  0      0    2000  192.168.10.12
2  2      2     500   192.168.10.4
3  3      3     159  192.168.10.56

print df.nlargest(3, 'Client')
   0  Bytes  Client             Ip
1  0      0    2000  192.168.10.12
0  1      1    1000   192.168.10.2
2  2      2     500   192.168.10.4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
28

Note: sort is deprecated - use sort_values instead

To sort descending use ascending=False:

In [6]: df.sort('Bytes', ascending=False)
Out[6]:
   0  Bytes      Client Ip
1  1   2000  192.168.10.12
0  0   1000   192.168.10.2
2  2    500   192.168.10.4
3  3    159  192.168.10.56

To take the first 10 values use .head(10).

Zero
  • 11,593
  • 9
  • 52
  • 70
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • There are 10 columns in the dataframe. I want to print only the highest 10 values in "Bytes" column and and its related "Client IP" column value? – Nilani Algiriyage Jun 06 '13 at 09:46
  • So: `df.sort('Bytes', ascending=False).head(10)` ? or `df.sort('Bytes', ascending=False).head(10)[['Bytes, 'Client Ip']]` ... – Andy Hayden Jun 06 '13 at 09:50
  • these are the values that I get? They are not sorted? 9977 9977 9977 9977 9972 9968 9968 9968 99604 – Nilani Algiriyage Jun 06 '13 at 10:09
  • I found this in some place, how can i adjust it...DataFrame({'count' : bt.groupby( [ "Bytes", "Client IP"] ).size()}).reset_index().head(10) @Andy Hayden ok Thanks. – Nilani Algiriyage Jun 06 '13 at 10:10
  • @NilaniAlgiriyage they are not integers :) (it is sorting the strings lexigraphically). You need to force the Bytes column to integers, e.g. df['Bytes'] = df['Bytes'].astype(int). – Andy Hayden Jun 06 '13 at 10:11
  • I have used a smaller version of the data file and following is the output. Why it not sorted?print df.sort('Bytes Returned', ascending=False).head(10)[['Bytes Returned', 'Client-IP']] ` Bytes Client-IP 3 9848 61.245.172.12 4 4985 61.245.172.14 0 4482 61.245.172.14 8 35535 61.245.172.1 1 3094 61.245.172.14 6 2282 61.245.172.14 2 20037 61.245.172.14 5 1498 61.245.172.12 7 1048 61.245.172.14 ` – Nilani Algiriyage Jun 06 '13 at 13:11
  • 1
    It still looks like it's sorting lexicographically, did you force the Bytes column to integers? – Andy Hayden Jun 06 '13 at 13:18
  • @NilaniAlgiriyage you can mark this answer as correct and upvote if it helped :) Happy panda-ing! – Andy Hayden Jun 06 '13 at 13:33
3
df['Bytes'] = df['Bytes'].astype('int')
print df.sort('Bytes', ascending=False).head(10)[['Bytes', 'Client-IP']]

I could solve it using above code with the help of Andy Hayden. :D

Nilani Algiriyage
  • 32,876
  • 32
  • 87
  • 121
  • This worked because your Bytes column was (incorrectly) stored as strings, something isn't quite right in the way you are reading into your DataFrame. This part shouldn't be necessary. – Andy Hayden Jun 06 '13 at 14:16
3
df[['Bytes', 'Client Ip']].sort_values('Bytes', ascending=False).nlargest(10, 'Bytes')

This should get you everything you need 1) Sorting Bytes 2) Returning the Largest 10 Bytes values

megamind
  • 121
  • 1
  • 8