I got multiple Timeseries Dataframes which are like different assets.
The problem is that there are holes in the data (which are not there on the other assets).
Question: What are some qualitative ways to clean the data so that i can fill the lacking rows by something near reality?
extra information:
My first ideas:
LSTM that predicts the lacks (problem: I could only train it on the rows-sequences without holes -> bias)
ARIMA (no idea, just heard of it)
mean of the value after & before (-> unrealistic and this misses outliers & spikes)
what are better approaches? (dropping is no option)
Heres some sample data:
(...which I just wrote by hand as an example, the prices are trash but just to show the holes as NaN values.)
df1
Open High Low Close
Time
2014-10-10 00:00:00 1.12345 1.12345 1.12345 1.12345
2014-10-13 00:00:00 1.12345 1.12345 1.12345 1.12345
2014-10-14 00:00:00 1.12345 1.12345 1.12345 1.12345
2014-10-15 00:00:00 1.12345 1.12345 1.12345 1.12345
2014-10-16 00:00:00 1.12345 1.12345 1.12345 1.12345
... ... ... ... ...
2016-02-23 16:00:00 1.12345 1.12345 1.12345 1.12345
2016-02-23 17:00:00 1.12345 1.12345 1.12345 1.12345
2016-02-23 18:00:00 1.12345 1.12345 1.12345 1.12345
2016-02-23 19:00:00 NaN NaN NaN NaN
2016-02-23 20:00:00 1.12345 1.12345 1.12345 1.12345
df2
Open High Low Close
Time
2014-10-10 00:00:00 28391.12345 28391.12352 28391.12332 28391.12347
2014-10-13 00:00:00 28391.12348 28391.12358 28391.12340 28391.12350
2014-10-14 00:00:00 NaN NaN NaN NaN
2014-10-15 00:00:00 28391.12350 28391.12354 28391.12344 28391.12353
2014-10-16 00:00:00 28391.12350 28391.12354 28391.12344 28391.12353
... ... ... ... ...
2016-02-23 16:00:00 28391.30000 28391.30000 28391.10000 28391.10000
2016-02-23 17:00:00 28391.10000 28391.50000 28391.09000 28391.40000
2016-02-23 18:00:00 28391.12345 28391.12345 28391.12345 28391.12345
2016-02-23 19:00:00 28391.12345 28391.12345 28391.12345 28391.12345
2016-02-23 20:00:00 28391.12345 28391.12345 28391.12345 28391.12345