Following on from Reading CSV file in Pandas with historical dates I have some CSV data in the form:
Object,Earliest Date
Object1,01/01/2000
Object2,01/01/1760
Object3,01/01/1520
...
which I've now read into Pandas (using Period to handle historical dates) and create a series. I'm trying to bin the series into decades, but stumbling on getting the Period values into the form groupby expects. So far I've tried (where s is the series created from_csv):
def dt_parse(s):
try:
d,m,y = s.split('/')
return pd.Period(year=int(y), month=int(m), day=int(d), freq='D')
except:
return pd.NaT
s2 = s['Earliest Date'].apply(dt_parse) #Create Period values
pi = pd.PeriodIndex(s2)
decades = pi.groupby(pd.Grouper(freq="120M")).count()
which fails with:
TypeError: Argument 'labels' has incorrect type (expected numpy.ndarray, got TimeGrouper)
Trying to group on it as a series:
decades = s2.groupby(pd.Grouper(freq="120M")).count()
fails with:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Trying to group on it as a DataFrame:
df = pd.DataFrame(s2)
decades = df.groupby(pd.Grouper(freq="120M", key='Earliest Date')).size()
fails with:
AttributeError: 'Index' object has no attribute 'to_timestamp'
Not sure how else to do it ?!