1

I have a pandas series that I have extracted from a dataframe in Python 3.7. It contains a series of timecodes such as this:

17833    Sat, 27 Nov 2010 06:00:00 -0000
851      Fri, 04 Dec 2009 06:07:00 -0000
4806     Fri, 23 Mar 2012 06:02:15 -0000
16341    Sat, 20 Aug 2011 11:48:18 -0000
9444     Mon, 16 May 2011 08:06:53 -0000
                      ...               
3262     Fri, 16 Dec 2011 07:30:00 -0000
37554    Wed, 11 Apr 2012 02:20:34 -0000
37555    Wed, 11 Apr 2012 02:34:00 -0000
28471    Thu, 18 Feb 2010 04:46:00 -0000
30324    Thu, 28 Jun 2012 21:23:40 -0000

The numbers on the left are the indices of the original entries. I would like to be able to sort this series into a variety of alternative time formats such as grouping by weekday (group all "Sat", group all "Wed" etc.) or grouping by month ("Nov","May"). It would even be great to sort by hour on a 24 hour clock using this timecode information (all entries at hour 02, hour 06, etc.).

Target outputs would be (just sorting this sample):

by month

28471    Feb
4806     Mar
37554    Apr
37555    Apr
9444     May
                      ...
30324    Jun
16341    Aug
17833    Nov
851      Dec
3262     Dec

by weekday

9444     Mon
37554    Wed
37555    Wed
28471    Thu
30324    Thu
                      ...
4806     Fri
851      Fri
3262     Fri
16341    Sat
17833    Sat

by time

37554    02
37555    02
28471    04
17833    06
4806     06
                      ...     
851      06
3262     07
9444     08
16341    11
30324    21

I have already tried to use the pd.to_datetime() function but I am not sure what formatting to give to this function so that it can understand the series, clarification here could be helpful.

3 Answers3

1

To sort by day of week, we can convert your date to actual datetime format (datetime64). Then we extract the dayofweek from the datetime and sort it by that number:

s = pd.to_datetime(df['Col1'].str.rsplit(n=2).str[0], format='%a, %d %b %Y').dt.dayofweek
df.assign(dayofweek=s).sort_values('dayofweek').drop(columns=['dayofweek'])

Output

                              Col1
4  Mon, 16 May 2011 08:06:53 -0000
6  Wed, 11 Apr 2012 02:20:34 -0000
7  Wed, 11 Apr 2012 02:34:00 -0000
8  Thu, 18 Feb 2010 04:46:00 -0000
9  Thu, 28 Jun 2012 21:23:40 -0000
1  Fri, 04 Dec 2009 06:07:00 -0000
2  Fri, 23 Mar 2012 06:02:15 -0000
5  Fri, 16 Dec 2011 07:30:00 -0000
0  Sat, 27 Nov 2010 06:00:00 -0000
3  Sat, 20 Aug 2011 11:48:18 -0000

dt.dayofweek returns a series with day of week represented as an integer:

pd.to_datetime(df['Col1'].str.rsplit(n=2).str[0], format='%a, %d %b %Y').dt.dayofweek

0    5
1    4
2    4
3    5
4    0
5    4
6    2
7    2
8    3
9    3
Name: Col1, dtype: int64

You can do the same for month:

s2 = pd.to_datetime(df['Col1'].str.rsplit(n=2).str[0], format='%a, %d %b %Y').dt.month
df.assign(month=s2).sort_values('month').drop(columns=['month'])

                              Col1
8  Thu, 18 Feb 2010 04:46:00 -0000
2  Fri, 23 Mar 2012 06:02:15 -0000
6  Wed, 11 Apr 2012 02:20:34 -0000
7  Wed, 11 Apr 2012 02:34:00 -0000
4  Mon, 16 May 2011 08:06:53 -0000
9  Thu, 28 Jun 2012 21:23:40 -0000
3  Sat, 20 Aug 2011 11:48:18 -0000
0  Sat, 27 Nov 2010 06:00:00 -0000
1  Fri, 04 Dec 2009 06:07:00 -0000
5  Fri, 16 Dec 2011 07:30:00 -0000
Erfan
  • 40,971
  • 8
  • 66
  • 78
1

If you want exactly like your posted output, you can do, considering the column name as 'funded date':

For month:

s_month=pd.to_datetime(df['funded date']).dt.month_name().str[:3]
s_month.reindex(pd.to_datetime(df['funded date']).dt.month.sort_values().index)

28471    Feb
4806     Mar
37554    Apr
37555    Apr
9444     May
30324    Jun
16341    Aug
17833    Nov
851      Dec
3262     Dec

For Day:

s_day=pd.to_datetime(df['funded date']).dt.day_name().str[:3]
s_day.reindex(pd.to_datetime(df['funded date']).dt.dayofweek.sort_values().index)

9444     Mon
37554    Wed
37555    Wed
28471    Thu
30324    Thu
851      Fri
4806     Fri
3262     Fri
17833    Sat
16341    Sat
anky
  • 74,114
  • 11
  • 41
  • 70
0

For formatting you can pandas.to_datetime() method or just apply strftime/strptime to Series with apply() method. Later you can use sort_values() method for either series or data frame for your expected output.

Refer to this documentation for to_datetime() and for formatting refer to this page.

null
  • 1,944
  • 1
  • 14
  • 24