9

I have a Pandas dataframe called pd, and I extract the number of unique values in one of the columns of this dataframe using the following command:

b = df.groupby('Region').size()

b is a Pandas series object and looks like this:

In [48]: b
Out[48]: 
Region
0          8
1         25
11         1
2         41
3         23
4         15
5         35
6         24
7         27
8         50
9         55
N         10

I am trying to plot a barchart of this series, however, I would like to sort it first based on the first column (because of that 11 between 1 and 2), which will be the x axis labels. I tried passing the sort command, but it sorts the series based on the values in the second column:

b.sort()

In [48]: b
Out[54]: 
Region
11         1
0          8
N         10
4         15
3         23
6         24
1         25
7         27
5         35
2         41
8         50
9         55

Well, is there a way to sort this series based on the first column?

cs95
  • 379,657
  • 97
  • 704
  • 746
marillion
  • 10,618
  • 19
  • 48
  • 63
  • This question is regarding sorting a pandas Series by its _index_. If you wish to know how to sort a Series by values, see [this post](https://stackoverflow.com/questions/12133075/sorting-a-pandas-series/54299881#54299881). – cs95 Jan 22 '19 at 01:14

3 Answers3

16

You are looking for sort_index:

In [80]: b.sort_values()
Out[80]: 
6     1
11    2
9     2
1     4
10    4
2     5
3     6
4     7
8     8
5     9
dtype: int64

In [81]: b.sort_index()
Out[81]: 
1     4
2     5
3     6
4     7
5     9
6     1
8     8
9     2
10    4
11    2
dtype: int64
InLaw
  • 2,537
  • 2
  • 21
  • 33
bdiamante
  • 15,980
  • 6
  • 40
  • 46
  • `sort_index()` still gives me the same output with `Out[48]` above, 11 is still between 1 and 2. It's like pandas is treating the index values as text. I do have `N` as one of the index values though. – marillion Oct 02 '13 at 19:16
8

There is only 1 'column' of values. The first 'column' is the index. Docs are here

In [8]: s = Series([3,2,1],index=[1,3,2])

In [9]: s
Out[9]: 
1    3
3    2
2    1
dtype: int64

Sort by the index

In [10]: s.sort_index()
Out[10]: 
1    3
2    1
3    2
dtype: int64

Sort by values

In [11]: s.sort_values()
Out[11]: 
2    1
3    2
1    3
dtype: int64
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • I left this comment to bdiamente's reply, but here it goes as well: `sort_index()` still gives me the same output with `Out[48]` above, 11 is still between 1 and 2. It's like pandas is treating the index values as text. I do have `N` as one of the index values though. – marillion Oct 02 '13 at 19:16
  • they COULD be text, how did you create it? – Jeff Oct 02 '13 at 19:19
  • @marillion Your values are text. You'll need to convert the numbers to numeric types and use an object dtype Index. Then `sort_index()` will do what you want, although there's most likely a better way to do this without having mixed numeric/string indices. – Phillip Cloud Oct 02 '13 at 19:20
  • @Jeff the df was constructed using pd.read_csv('filename.csv'), then I used `groupby` to count the number of unique values. Maybe the 'N' value in the data resulted in an index with text type. – marillion Oct 02 '13 at 19:32
  • ahh...didn't see the N, yes for sure, its automatically text that way. Best to put thin in a DataFrame, replace that value with a number and you will be good to go. – Jeff Oct 02 '13 at 19:49
  • @Jeff, that's what I did at the end. N is now 111. I will probably play with the bar chart tickmarks later in matplotlib. Thanks for the help! – marillion Oct 02 '13 at 20:33
  • order was deprecated quite a while ago - use .sprt_values() – Jeff Feb 07 '18 at 00:15
5

You need to convert your index to an object index, because it's currently sorting lexicographically, not numerically:

In [97]: s = read_clipboard(header=None)

In [98]: news = s.rename(columns=lambda x: ['Region', 'data'][x])

In [99]: news
Out[99]:
   Region  data
0       0     8
1       1    25
2      11     1
3       2    41
4       3    23
5       4    15
6       5    35
7       6    24
8       7    27
9       8    50
10      9    55
11      N    10

In [100]: news_converted = news.convert_objects(convert_numeric=True)

In [101]: news_converted
Out[101]:
    Region  data
0        0     8
1        1    25
2       11     1
3        2    41
4        3    23
5        4    15
6        5    35
7        6    24
8        7    27
9        8    50
10       9    55
11     NaN    10

In [102]: news_converted.loc[11, 'Region'] = 'N'

In [103]: news_converted_with_index = news_converted.set_index('Region')

In [104]: news_converted_with_index
Out[104]:
        data
Region
0.0        8
1.0       25
11.0       1
2.0       41
3.0       23
4.0       15
5.0       35
6.0       24
7.0       27
8.0       50
9.0       55
N         10

In [105]: news_converted_with_index.sort_index()
Out[105]:
        data
Region
0.0        8
1.0       25
2.0       41
3.0       23
4.0       15
5.0       35
6.0       24
7.0       27
8.0       50
9.0       55
11.0       1
N         10

There's most likely a better way to create your Series so that it doesn't mix index types.

Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • @Philip Cloud thanks! this seems to solve the problem of sorting, now I am looking into constructing this series in a better way as you mentioned. This whole thing started with trying to count the unique values in a df column, which has that 'N' as one of the data values. – marillion Oct 02 '13 at 19:38