1

With Pandas I am using this answer to clean up dates with a variety of formats. This works perfectly if I filter out the dates that are prior to 1677. However my dates are historic and many date before 1677 so I get an OutOfBoundsDatetime error.

My data contains dates like:

27 Feb 1928,
1920,
October 2000,
1500,
1625,
Mar 1723

I can see a reference here to using pd.Period but I don't know how to apply it to my case as the dates need to be cleaned first before I can adapt this sample

My code to clean the dates is:

df['clean_date'] = df.dates.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y'))

df

I would like help to convert and clean my dates including the historic dates. Grateful for assistance with this.

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
user3471259
  • 215
  • 1
  • 15
  • what is your goal? You can't have a Pandas Series of `datetime64[ns]` dtype with values before `1677-09-21 00:12:43.145225` or after `2262-04-11 23:47:16.854775807` - they will be wrapped. You can have a Period dtype though... – MaxU - stand with Ukraine Jul 16 '19 at 12:54

1 Answers1

2

As it is clearly stated in the online documentation you can't have values of datetime64[ns] dtype that are not falling into ['1677-09-21 00:12:43.145225', '2262-04-11 23:47:16.854775807'].

But you can have such dates as Period dtype.

Sample input dataset:

In [156]: df
Out[156]:
           Date
0   27 Feb 1928
1          1920
2  October 2000
3          1500
4          1625
5      Mar 1723

In [157]: df.dtypes
Out[157]:
Date    object
dtype: object

Solution:

In [158]: df["new"] = pd.PeriodIndex([pd.Period(d, freq="D") for d in df.Date])

Result:

In [159]: df
Out[159]:
           Date         new
0   27 Feb 1928  1928-02-27
1          1920  1920-01-01
2  October 2000  2000-10-01
3          1500  1500-01-01
4          1625  1625-01-01
5      Mar 1723  1723-03-01

In [160]: df.dtypes
Out[160]:
Date       object
new     period[D]
dtype: object

In [161]: df["new"].dt.year
Out[161]:
0    1928
1    1920
2    2000
3    1500
4    1625
5    1723
Name: new, dtype: int64
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419