I am trying to fill in the NaN's
after I upsample my timeseries with resample's pad() function
.
I used the resample('1min').asfreq
to upsample from hourly data to minute-interval data, then used resample.('1min').pad()
it does not fill in the NaN
values with the previous value as it should in this Pandas.Dataframe.resample tutorial.
Run to create dataframe with datetime index
url = "https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h2016.txt.gz&dir=data/historical/stdmet/"
data_csv = urlopen(url)
df = pd.read_csv(data_csv, delim_whitespace=True, index_col=0, parse_dates=True)
df.drop(['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 'VIS', 'TIDE', 'VIS', 'ATMP', 'WTMP'],
axis = 1, inplace = True)
#Data Preparation
df.reset_index(level=0, inplace=True)
df = df.iloc[1:]
df = df.rename(columns={'#YY': 'YY'})
#Create datetime variable
df['Date'] = df[df.columns[0:3]].apply(lambda x: '/'.join(x.dropna().astype(int).astype(str)),axis=1)
df['Time'] = df[df.columns[3:5]].apply(lambda x: ':'.join(x.dropna().astype(int).astype(str)),axis=1)
df['Date.Time'] = df['Date'] + ':' + df['Time']
df['Date'] = pd.to_datetime(df['Date'], format = '%Y/%m/%d')
df['Date.Time'] = pd.to_datetime(df['Date.Time'], format='%Y/%m/%d:%H:%M', utc=True)
#Remaining data prep for the dataframe and create index w/ time date
df = df.convert_objects(convert_numeric=True)
df = df[(df['MM'] == 2.0) | (df['MM'] == 3.0)]
df = df.replace(999, np.nan)
df = df.set_index('Date.Time')
df.drop(['hh', 'mm', 'Time', 'Date'], axis = 1, inplace = True)
The result is the dataframe we want:
YY MM DD DEWP
Date.Time
2016-12-01 00:00:00+00:00 2016 12 1 11.3
2016-12-01 01:00:00+00:00 2016 12 1 9.0
2016-12-01 02:00:00+00:00 2016 12 1 11.0
2016-12-01 03:00:00+00:00 2016 12 1 10.8
2016-12-01 04:00:00+00:00 2016 12 1 6.5
Now resample up to 1 min from an hour
df = df.resample('1min').asfreq()
df.head()
Results:
YY MM DD DEWP
Date.Time
2016-12-01 00:00:00+00:00 2016.0 12.0 1.0 11.3
2016-12-01 00:01:00+00:00 NaN NaN NaN NaN
2016-12-01 00:02:00+00:00 NaN NaN NaN NaN
2016-12-01 00:03:00+00:00 NaN NaN NaN NaN
2016-12-01 00:04:00+00:00 NaN NaN NaN NaN
Fill in NaN values with Pad command
df = df.resample('1min').pad()
df.head()
Results:
YY MM DD DEWP
Date.Time
2016-12-01 00:00:00+00:00 2016.0 12.0 1.0 11.3
2016-12-01 00:01:00+00:00 NaN NaN NaN NaN
2016-12-01 00:02:00+00:00 NaN NaN NaN NaN
2016-12-01 00:03:00+00:00 NaN NaN NaN NaN
2016-12-01 00:04:00+00:00 NaN NaN NaN NaN
Variable DEWP
is supposed to look like this
YY MM DD DEWP
Date.Time
2016-12-01 00:00:00+00:00 2016.0 12.0 1.0 11.3
2016-12-01 00:01:00+00:00 2016.0 12.0 1.0 11.3
2016-12-01 00:02:00+00:00 2016.0 12.0 1.0 11.3
2016-12-01 00:03:00+00:00 2016.0 12.0 1.0 11.3
2016-12-01 00:04:00+00:00 2016.0 12.0 1.0 11.3
Any help would be appreciated!