Very similar to this question except I need to consider both date and time; indexer_between_time
does not appear to support any datetime formats I can find.
I have a dask dataframe that looks like this:
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.150902 -98.362
2017-01-01 00:01:40 12.0113 37.150902 -98.362
2017-01-01 00:02:00 12.0057 37.150902 -98.362
2017-01-01 00:02:20 12.0113 37.150902 -98.362
2017-01-01 00:02:40 12.0058 37.150902 -98.362
2017-01-01 00:03:00 12.0113 37.150902 -98.362
And a list of columns to mask at specific time ranges (the data in these ranges are considered "bad" and should return None
there instead) in the form or a list of python tuples:
[ # var start of mask end of mask
('lat', '2017-01-01 00:01:40', '2017-01-01 00:02:00'),
('lon', '2017-01-01 00:02:40', '2017-01-01 00:03:00'),
]
Desired Result:
logger_volt lat lon
time
2017-01-01 00:01:20 12.0112 37.150902 -98.362
2017-01-01 00:01:40 12.0113 None -98.362
2017-01-01 00:02:00 12.0057 None -98.362
2017-01-01 00:02:20 12.0113 37.150902 -98.362
2017-01-01 00:02:40 12.0058 37.150902 None
2017-01-01 00:03:00 12.0113 37.150902 None
Non-working code:
dqrs = [ # var start of mask end of mask
('lat', '2017-01-01 00:01:40', '2017-01-01 00:02:00'),
('lon', '2017-01-01 00:02:40', '2017-01-01 00:03:00'),
]
df = xarray.open_dataset('filename.cdf').to_dask_dataframe()
dqr_mask = (df == df) | df.isnull() # create a dummy mask that's all True
for var, start, end in dqrs:
dqr_mask |= ((df.columns == var) & (df.index >= start) & (df.index >= end))
df = df.mask(dqr_mask).compute()
Problems with other approaches:
- Dask dataframes don't yet implement slice assignment so something like
df[start:end] = None
won't work for this