I have a dataframe with two columns (age, date) indicating the age of a person and the current date. I want to approximate the date of birth from that data. I thought to fit a linear model and find the interception with the, but it does not work out of the box. Pandas does not support the ols()
function anymore.
import pandas as pd
import seaborn as sns
from pandas import Timestamp
age = [30, 31, 31, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 34, 34]
date = [Timestamp('2001-02-10 00:01:00'),
Timestamp('2001-11-12 00:01:00'),
Timestamp('2002-02-27 00:01:00'),
Timestamp('2002-07-05 00:01:00'),
Timestamp('2002-07-20 00:01:00'),
Timestamp('2002-08-15 00:01:00'),
Timestamp('2002-09-08 00:01:00'),
Timestamp('2002-10-15 00:01:00'),
Timestamp('2002-12-21 00:01:00'),
Timestamp('2003-04-04 00:01:00'),
Timestamp('2003-07-29 00:01:00'),
Timestamp('2003-08-11 00:01:00'),
Timestamp('2004-02-28 00:01:00'),
Timestamp('2005-01-11 00:01:00'),
Timestamp('2005-01-12 00:01:00')]
df = pd.DataFrame({'age': age, 'date': date})
sns.regplot(df.age, df.date)
Throws an error:
TypeError: reduction operation 'mean' not allowed for this dtype
What is the best way to transform the data to something that can be fitted and transform it back to dates and estimate confidence intervals? Is there any package that can handle pandas.Timestamps out of the box? E.g. scikit-learn?