0

I have a pandas dataframe (dfm), I want to get the min by each day and have the resulted sorted by day. There are more one - Oct 4 or Jun 7....etc.

dfmn
  count      Month  Day  Data_Value
    1        Nov   26          11
    3        Oct    4         178
    4        Nov   28          94
    5        Aug    6         144
    8        Jun    7          89
    9        Jan   25          33
    10       Mar   30          72
    11       Oct   14         106
    13       May   21          89
    17       Mar   27          44
    20       Sep   17         100
    21       Aug    4         194
    22       Jan   26          61
    24       Jun    7         100
    31       Sep   28         117
    32       Oct    1         139
    37       Apr   22          78
    39       Aug    4         200
    40       Jan   24          33
    45       Jun    4         150
    47       Oct   22         100
    49       Sep   14          94
    51       Mar   15          22
    52       Nov   25          50
    53       Oct   15         144
    55       Mar   30         106
    59       Jan   19          94
    60       Feb   28          78
    61       Aug    4         133
    62       Jun   14         117
    64       Mar   14          44
    66       Sep   18         106

I did the following, now my result-set has min for each month/day combination but it is not sorted by month-day, pandas is probably using English alphabet collation sequence for sorting.

dfmn.groupby (["Month","Day"]).min()

          Data_Value
Month Day            
Apr   1          23.9
      2          24.4
      3          29.4
      4          32.2
      .          .
      .          .
 Aug  1          25.2
      2          33.1

I need

Jan   1          21.9
      2          20.4
      3          20.4
      4          14.2
      .          .
      .          .
 Feb  1          15.2
      2          13.1

How can I make this happen?

jpp
  • 159,742
  • 34
  • 281
  • 339
Rock
  • 1
  • 1

1 Answers1

2

You can set Month to be an ordered categorical of all the months of the year:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

df['Month'] = pd.Categorical(df['Month'], categories = months, ordered=True)

Then, when you do your groupby, they will be ordered:

>>> df.groupby(["Month","Day"]).min()
           count  Data_Value
Month Day                   
Jan   1      NaN         NaN
      4      NaN         NaN
      6      NaN         NaN
      7      NaN         NaN
      14     NaN         NaN
      15     NaN         NaN
      17     NaN         NaN
      18     NaN         NaN
      19    59.0        94.0
      21     NaN         NaN
      22     NaN         NaN
      24    40.0        33.0
      25     9.0        33.0
      26    22.0        61.0
      27     NaN         NaN
      28     NaN         NaN
      30     NaN         NaN
Feb   1      NaN         NaN
      4      NaN         NaN
      6      NaN         NaN
      7      NaN         NaN
      14     NaN         NaN
.....

It might be easier just to get your month abbreviations from the calendar module, though:

import calendar

months = [calendar.month_abbr[i] for i in range(1,13)]

>>> months
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
sacuL
  • 49,704
  • 8
  • 81
  • 106