20

I'm new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exactly correct.

Here is the code I have used:

league_dataframe.sort_values('overall_league_position')

The result that the sort method yields values in column 'overall league position' are not sorted in ascending or order which is the default for the method.

enter image description here

What am I doing wrong? Thanks for your patience!

cs95
  • 379,657
  • 97
  • 704
  • 746
Newkid
  • 325
  • 1
  • 2
  • 7
  • 1
    Please paste your code directly into the question, not as images. You can use the {} button to format it correctly. You should do this for the output too. – Craig Dec 20 '17 at 21:01
  • 2
    It's a column of strings, that's why. – cs95 Dec 20 '17 at 21:01

1 Answers1

34

For whatever reason, you seem to be working with a column of strings, and sort_values is returning you a lexsorted result.

Here's an example.

df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
df

  Col
0   1
1   2
2   3
3  10
4  20
5  19

df.sort_values('Col')

  Col
0   1
3  10
5  19
1   2
4  20
2   3

The remedy is to convert it to numeric, either using .astype or pd.to_numeric.

df.Col = df.Col.astype(float)

Or,

df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col')

   Col
0    1
1    2
2    3
3   10
5   19
4   20

The only difference b/w astype and pd.to_numeric is that the latter is more robust at handling non-numeric strings (they're coerced to NaN), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 2
    What if I need to actually handle string values and keep them as such? For example strings "1%", "2%", "10%", "25%", ...? Is there a way of sorting the values by a custom comparator without having to transform the data back and forth? – Adam Bajger Jul 16 '20 at 15:13
  • @AdamBajger You can lookup "pandas natural sorting column" online. – cs95 Jul 16 '20 at 15:25
  • @cs95 I just found a comprehensive answer [here](https://stackoverflow.com/questions/13838405/custom-sorting-in-pandas-dataframe/54301218#54301218), thanks for natsorted, though, helped to. – Adam Bajger Jul 16 '20 at 15:41
  • @AdamBajger oh awesome, I think I know the chap who owns that answer... – cs95 Jul 16 '20 at 17:04
  • Saved my day, man! The `.astype(float)` worked. – Davidson Lima Mar 06 '21 at 18:39