1

I am trying to fill in the NaN's after I upsample my timeseries with resample's pad() function.

I used the resample('1min').asfreq to upsample from hourly data to minute-interval data, then used resample.('1min').pad() it does not fill in the NaN values with the previous value as it should in this Pandas.Dataframe.resample tutorial.

Run to create dataframe with datetime index

url = "https://www.ndbc.noaa.gov/view_text_file.php?filename=42887h2016.txt.gz&dir=data/historical/stdmet/"
data_csv = urlopen(url)
df = pd.read_csv(data_csv, delim_whitespace=True, index_col=0, parse_dates=True)
df.drop(['WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 'VIS', 'TIDE', 'VIS', 'ATMP', 'WTMP'], 
        axis = 1, inplace = True)

#Data Preparation
df.reset_index(level=0, inplace=True)
df = df.iloc[1:]
df = df.rename(columns={'#YY': 'YY'})

#Create datetime variable
df['Date'] = df[df.columns[0:3]].apply(lambda x: '/'.join(x.dropna().astype(int).astype(str)),axis=1)
df['Time'] = df[df.columns[3:5]].apply(lambda x: ':'.join(x.dropna().astype(int).astype(str)),axis=1)
df['Date.Time'] = df['Date'] + ':' + df['Time']
df['Date'] = pd.to_datetime(df['Date'], format = '%Y/%m/%d')
df['Date.Time'] = pd.to_datetime(df['Date.Time'], format='%Y/%m/%d:%H:%M', utc=True)

#Remaining data prep for the dataframe and create index w/ time date
df = df.convert_objects(convert_numeric=True)
df = df[(df['MM'] == 2.0) | (df['MM'] == 3.0)]
df = df.replace(999, np.nan)
df = df.set_index('Date.Time')
df.drop(['hh', 'mm', 'Time', 'Date'], axis = 1, inplace = True)

The result is the dataframe we want:

                             YY  MM  DD  DEWP
Date.Time                                    
2016-12-01 00:00:00+00:00  2016  12   1  11.3
2016-12-01 01:00:00+00:00  2016  12   1   9.0
2016-12-01 02:00:00+00:00  2016  12   1  11.0
2016-12-01 03:00:00+00:00  2016  12   1  10.8
2016-12-01 04:00:00+00:00  2016  12   1   6.5

Now resample up to 1 min from an hour

df = df.resample('1min').asfreq()
df.head()

Results:

                               YY    MM   DD  DEWP
Date.Time                                         
2016-12-01 00:00:00+00:00  2016.0  12.0  1.0  11.3
2016-12-01 00:01:00+00:00     NaN   NaN  NaN   NaN
2016-12-01 00:02:00+00:00     NaN   NaN  NaN   NaN
2016-12-01 00:03:00+00:00     NaN   NaN  NaN   NaN
2016-12-01 00:04:00+00:00     NaN   NaN  NaN   NaN

Fill in NaN values with Pad command

df = df.resample('1min').pad()
df.head()

Results:

                               YY    MM   DD  DEWP
Date.Time                                         
2016-12-01 00:00:00+00:00  2016.0  12.0  1.0  11.3
2016-12-01 00:01:00+00:00     NaN   NaN  NaN   NaN
2016-12-01 00:02:00+00:00     NaN   NaN  NaN   NaN
2016-12-01 00:03:00+00:00     NaN   NaN  NaN   NaN
2016-12-01 00:04:00+00:00     NaN   NaN  NaN   NaN

Variable DEWP is supposed to look like this

                               YY    MM   DD  DEWP
Date.Time                                         
2016-12-01 00:00:00+00:00  2016.0  12.0  1.0  11.3
2016-12-01 00:01:00+00:00  2016.0  12.0  1.0  11.3
2016-12-01 00:02:00+00:00  2016.0  12.0  1.0  11.3
2016-12-01 00:03:00+00:00  2016.0  12.0  1.0  11.3
2016-12-01 00:04:00+00:00  2016.0  12.0  1.0  11.3

Any help would be appreciated!

denis
  • 21,378
  • 10
  • 65
  • 88
Starbucks
  • 1,448
  • 3
  • 21
  • 49

1 Answers1

0

The function df.resample('1min').fillna("pad") worked. Documentation can be found here.

Starbucks
  • 1,448
  • 3
  • 21
  • 49