Get week start date (Monday) from a date column in Python (pandas)?

Question

I have seen a lot of posts about how you can do it with a date string but I am trying something for a dataframe column and haven't got any luck so far. My current method is : Get the weekday from 'myday' and then offset to get monday.

df['myday'] is column of dates. 
mydays = pd.DatetimeIndex(df['myday']).weekday
df['week_start'] = pd.DatetimeIndex(df['myday']) - pd.DateOffset(days=mydays)

But I get TypeError: unsupported type for timedelta days component: numpy.ndarray

How can I get week start date from a df column?

carleton · Answer 1 · 2022-10-22T06:41:49.203

51

Another alternative:

df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time)

This will set 'week_start' to be the first Monday before the time in 'myday'.

You can choose different week starts via anchored offsets e.g. ’W-THU’ to start the week on Thursday instead. (Thanks @Henry Ecker for that suggestion)

edited Oct 22 '22 at 06:41

answered Feb 24 '16 at 21:28

carleton

734
1
5
9

6

Thanks for this. `df['myday'].dt.to_period('W').dt.start_time` is probably faster than using `apply` (not sure when is this introduced, might not work for older pandas version) – Ryan Tam Apr 20 '21 at 14:41
2

It is possible to set an arbitrary day to start the week with [anchored offsets](https://pandas.pydata.org/docs/user_guide/timeseries.html#anchored-offsets). Sunday start would be `'W-SUN'` or Thursday Start `'W-THU'`. – Henry Ecker Aug 07 '22 at 03:31
@HenryEcker Thanks!! This should be added into this answer to cover more broader aspect – Itachi Oct 19 '22 at 06:19
For reference, where I can see documentation how to know the possible .dt options have? chaining .dt was not expected but awesome – Luis R. Gonzalez May 05 '23 at 02:49

n8yoder · Answer 2 · 2020-03-06T22:23:15.730

While both @knightofni's and @Paul's solutions work I tend to try to stay away from using apply in Pandas because it is usually quite slow compared to array-based methods. In order to avoid this, after casting to a datetime column (via pd.to_datetime) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D] by either casting it directly:

df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')

or by using to_timedelta as @ribitskiyb suggested:

df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D').

Using test data with 60,000 datetimes I got the following times using the suggested answers using the newly released Pandas 1.0.1.

%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1)
>>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time)
>>> 5.59 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')
>>> 3.44 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D')
>>> 3.47 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

These results show that Pandas 1.0.1 has dramatically improved the speed of the to_period apply based method (vs Pandas <= 0.25) but show that converting directly to a timedelta (by either casting the type directly .astype('timedelta64[D]') or using pd.to_timedelta is still superior. Based on these results I would suggest using pd.to_timedelta going forward.

How does this work? df['myday'].dt.weekday.astype('timedelta64[D]') in my dataset returns a series of all zeroes. Why or how would subtracted 0 from df['myday'] work? This seems like the best solution. — Don Quixote, Aug 12 '17 at 04:15
Clarifying the above post, the way I understand what is going on is that basically it is saying take the date and then subtract the dayofweek from it. But what I don't understand is why .astype('timedelta64[D]') results in all zeros. — Don Quixote, Aug 12 '17 at 18:15
@DonQuixote Probably your 'myday' frequency is less than '1D'. This won't work then. You have to modify it to subtract hours, minutes, etc. — grabantot, Apr 07 '18 at 07:58

score 12 · Answer 3 · answered Mar 13 '18 at 11:42

12

(Just adding to n8yoder's answer)

Using .astype('timedelta64[D]') seems not so readable to me -- found an alternative using just the functionality of pandas:

df['myday'] - pd.to_timedelta(arg=df['myday'].dt.weekday, unit='D')

answered Mar 13 '18 at 11:42

ribitskiyb

429
4
11

I liked the lambda expression from @Paul, but this answer is much, much faster and equally "pithy". Thank you for the solution. – Edmund's Echo Jul 17 '19 at 15:29

score 11 · Accepted Answer · edited Sep 05 '19 at 10:27

11

it fails because pd.DateOffset expects a single integer as a parameter (and you are feeding it an array). You can only use DateOffset to change a date column by the same offset.

try this :

import datetime as dt
# Change 'myday' to contains dates as datetime objects
df['myday'] = pd.to_datetime(df['myday'])  
# 'daysoffset' will container the weekday, as integers
df['daysoffset'] = df['myday'].apply(lambda x: x.weekday())
# We apply, row by row (axis=1) a timedelta operation
df['week_start'] = df.apply(lambda x: x['myday'] - dt.TimeDelta(days=x['daysoffset']), axis=1)

I haven't actually tested this code, (there was no sample data), but that should work for what you have described.

However, you might want to look at pandas.Resample, which might provide a better solution - depending on exactly what you are looking for.

edited Sep 05 '19 at 10:27

koPytok

3,453
1
14
29

answered Jan 18 '15 at 06:10

knightofni

1,906
3
17
22

Thankyou for the explanation. This solution does exactly what I wanted! – dev28 Jan 19 '15 at 12:37
1

Should the `TimeDelta` not refer to `df['daysoffset']`? – Carsten Aug 05 '19 at 07:34
using lambda will result in something slow – Nicolas Apr 12 '21 at 08:39

score 3 · Answer 5 · answered Apr 03 '20 at 05:41

from datetime import datetime, timedelta

# Convert column to pandas datetime equivalent
df['myday'] = pd.to_datetime(df['myday']) 

# Create function to calculate Start Week date
week_start_date = lambda date: date - timedelta(days=date.weekday())

# Apply above function on DataFrame column
df['week_start_date'] = df['myday'].apply(week_start_date)

Get week start date (Monday) from a date column in Python (pandas)?

5 Answers5

Linked

Related