1

I'm working with some datetime data in a dataframe. It's in a format day/month/year Ex:

Date
----------------
27/06/2021 00:00
27/06/2021 00:00
30/06/2021 00:00
30/06/2021 00:00
30/06/2021 00:00
18/06/2021 00:00
26/06/2021 00:00
28/06/2021 00:00
28/06/2021 00:00
27/06/2021 00:00
28/06/2021 00:00
30/06/2021 00:00
12/06/2021 00:00
28/06/2021 00:00

I want to extract the month and year, so I converted the column to datetime using data['date'] = pd.to_datetime(data['date'])

However, this is resulting in improper classification of what is the month and what is the day. When the dates above are converted to datetime, they end up like:

Date
-----------
2021-06-27
2021-06-27
2021-06-30
2021-06-30
2021-06-30
2021-06-18
2021-06-26
2021-06-28
2021-06-28
2021-06-27
2021-06-28
2021-06-30
2021-12-06
2021-06-28

All of these dates should have 06 for June as the month. But row 13 has incorrectly assigned 12 as the month, which is leading to incorrect results when I use groupby later. Is there a way to fix this?

harlan
  • 29
  • 6

2 Answers2

3

use

pd.to_datetime(data['date'], dayfirst=True)

since the day comes obviously first in your data but pandas assumes month first by default.

docs - "dayfirst : bool, default False"

Note: evaluation of day first / last is not strict, i.e. it is evaluated line by line - which is why you get such mixed results but no error.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
0

If extracting month and year, did you try?

df['Date']=pd.to_datetime(df['Date']).dt.strftime('%Y-%m')
wwnde
  • 26,119
  • 6
  • 18
  • 32