0

I'm using pandas dataframe with datetime index to work with timeseries data. Since I'm working with observed data there can be quite no. of missing values.

However I wanted to resample the observed timeseries as follows,

freq = 'H'
obs_mean = obs_mean.resample(freq).sum()

Here for missing values, it is okay to give NaN/NA because those can be handled via .dropna or fillna().

The problem here is, instead of NaN/NA it gives 'False' as the value.

before resampling:
                    value
time                      
2018-05-18 08:15:00  0.200
2018-05-18 08:20:00  0.600
2018-05-18 08:25:00  0.600
2018-05-18 08:30:00  0.400
2018-05-18 08:35:00  0.400
2018-05-18 10:10:00  2.000
2018-05-18 10:15:00  5.400

after resampling:
                      value
time                       
2018-05-18 08:00:00   2.200
2018-05-18 09:00:00   False
2018-05-18 10:00:00  24.800
2018-05-18 11:00:00   0.800
2018-05-18 12:00:00  21.400
2018-05-18 13:00:00   2.400
Thilina Madumal
  • 101
  • 1
  • 4
  • What is your pandas version? It looks like bug. – jezrael May 28 '18 at 05:44
  • I don't see that issue with Pandas 0.23 – piRSquared May 28 '18 at 05:52
  • my version is pandas==0.22.0 will move to 0.23 and see – Thilina Madumal May 28 '18 at 05:54
  • I checked with 0.23. Still, have the same problem. – Thilina Madumal May 28 '18 at 06:01
  • Well, how are you reading the time series data? Do you use `pd.read_csv` or you are constructing the `Series` by hand? – prabhakar May 28 '18 at 07:40
  • First, the time series is read from a MySQL db using SqlAlchemy. Then the read values are returned as an array of arrays as follows, result = Data.query.filter(Data.id == timeseries_id, Data.time >= start, Data.time < end).all() return [[data_obj.time, data_obj.value] for data_obj in result] Afterwards DF is constructed as follows, pd.DataFrame(data=array_tms, columns=['time', 'value']).set_index(keys='time') – Thilina Madumal May 28 '18 at 08:24

1 Answers1

1

I came across the same problem and I found there's missing original data during those periods... you haven't got data during 09:00-09:59.