2

I want to complete my time serie of % humidity with missing records (or rows). Sensors are designed to record a mean value each 15min, so that is my target frequency. Here an example for one station (not the best in terms of gaps...) but I have 36 stations of measurements, 6 parameters and more than 24 000 records each to homogenize.

I choose columns of datetime and % humidity for example :

humdt = data["la-salade"][["datetime","humidite"]]

               datetime  humidite
0   2019-07-09 08:30:00        87
1   2019-07-09 11:00:00        87
2   2019-07-09 17:30:00        82
3   2019-07-09 23:30:00        80
4   2019-07-11 06:15:00        79
5   2019-07-19 14:30:00        39

I set datetime as an index : (so far it works)

humdt["datetime"] = pd.to_datetime(humdt["datetime"])
humdt = humdt.set_index("datetime",drop=True)

                     humidite
datetime
2019-07-09 08:30:00        87
2019-07-09 11:00:00        87
2019-07-09 17:30:00        82
2019-07-09 23:30:00        80
2019-07-11 06:15:00        79
2019-07-19 14:30:00        39

Beside this, I prepare a datetime range matching my expectations (15min frequency) :

date_rng = pd.period_range(start=debut, end=fin, freq='15min').strftime('%Y-%m-%d %H:%M:%S')
date_rng = pd.DataFrame(date_rng)
date_rng.columns = ["datetime"]

Then, I use this range to reindex my humidity values (expecting NaN when missing) :

humdt = humdt.reindex(pd.DatetimeIndex(date_rng["datetime"]))

                     humidite
datetime
2019-07-09 08:30:00      87.0
2019-07-09 08:45:00      88.0
2019-07-09 09:00:00      88.0
2019-07-09 09:15:00      88.0
2019-07-09 09:30:00      89.0
2019-07-09 09:45:00      89.0
2019-07-09 10:00:00      88.0
2019-07-09 10:15:00      88.0
2019-07-09 10:30:00      88.0
2019-07-09 10:45:00      88.0
2019-07-09 11:00:00      87.0

As a result, I get values of humidity from nowhere... not even a classical linear interpolation (ex : between 87% at 08H30 and 87% at 11H00). Please, help me, I have no clue what is going on... (also tried to merge and resampling, as here behavior is not as expected). Thank you !

  • Cannot reproduce... after removing all the back and forth between strings and Datetime values, the reindex give NaN for all new timestamps. – Serge Ballesta May 15 '20 at 14:39
  • Thanks for trying Serge ! I also get this issue (all to NaN) doing a merge between expected datetime and dataframe of values. – Emilie REDON May 15 '20 at 14:52

1 Answers1

1

You can add attribute fill_value to df.reindex.

humdt = humdt.reindex(pd.DatetimeIndex(date_rng["datetime"]), fill_value=np.nan)

This will fill new values with NaN

Abbas
  • 623
  • 4
  • 6
  • Thank you Abbas. I tried with fill_value=np.nan (thought it was the default attribute but always better to verify it !) but I get the same result `>>> humdt = humdt.reindex(pd.DatetimeIndex(date_rng["datetime"]), fill_value=np.nan) >>> >>> >>> humdt humidite datetime 2019-07-09 08:30:00 87.0 2019-07-09 08:45:00 88.0 2019-07-09 09:00:00 88.0 2019-07-09 09:15:00 88.0 2019-07-09 09:30:00 89.0 2019-07-09 09:45:00 89.0 2019-07-09 10:00:00 88.0 2019-07-09 10:15:00 88.0` – Emilie REDON May 15 '20 at 14:36
  • That means it’s not adding values. They already exist somewhere in your dataset. – Abbas May 15 '20 at 14:38
  • OK thank you for your help. I'm confused because I create this sub dataframe in order to work on only two columns, datatime and one parameter. I don't catch what are these data about... I need to obtain a dataframe with a consistent frequency, recorded values and NaN elsewhere but definitively no methods are working. I miss something. – Emilie REDON May 15 '20 at 14:46