Let's say I have a time series where I usually have data available for a certain continous span of years, but missing values before and after that span, like this:
df = pd.DataFrame({'year': ["2000","2001","2002", "2003","2004", "2005","2006", "2007"], 'cakes eaten': [np.nan, np.nan, np.nan, 3, 4, 5, np.nan, np.nan]})
print(df)
year cakes eaten
0 2000 NaN
1 2001 NaN
2 2002 NaN
3 2003 3.0
4 2004 4.0
5 2005 5.0
6 2006 NaN
7 2007 NaN
Is there a way to fill (a given number of) missing values based on the trend seen in the available values?
Let's say I want to fill a maximum of 2 values in each direction, the result would have to look like this:
year cakes eaten
0 2000 NaN
1 2001 1.0
2 2002 2.0
3 2003 3.0
4 2004 4.0
5 2005 5.0
6 2006 6.0
7 2007 7.0
Also: is there a way to ensure that this imputation is only performed when there are enough available values , say for example I only want to fill a maximum of 2 values in each direction if there are at least 3 values available (or in more general terms, fill n only if n + m are availalbe) ?