I have a large set of measurment data (Datetime, temperature) that I need to downsample before ploting with bokeh (to keep smooth user interface)
Because there are irregular physical phenomens I want to see, I can't just resample the data or take one sample on 4 (or 10). I need a smarter approch to judge if a sample has to be kept.
My idea is to take a reference sample, and drop following samples as long as they are close to the reference sample (inside a window around the reference sample value). When a sample is out the window, it is kept, and it become the new reference sample for the following samples. I will get a dataset without frequency, but it is not an issue I think.
The following code is an implementation of my custom / fuzzy downsampling that replicate rather good the behaviour of my data.
import numpy as np
import pandas as pd
# DataFrame / pandas serie creation
size = 300000
index = pd.date_range('01/12/2017 08:15:49', periods=size, freq="3s")
s = 10*np.sin(np.arange(0, 2*np.pi, (2*np.pi/size)))
noise = np.random.random(size)
val = s + noise
serie = pd.Series(data=val, index=index)
# fuzzy downsampling
window = 0.5
def fuzz():
i = serie.index[0]
fuzzy_index = [i]
ref = serie.loc[i]
for ind, val in serie.iteritems():
if abs(val - ref) > window:
fuzzy_index.append(ind)
ref = serie.loc[ind]
return serie.loc[fuzzy_index]
# compute downsampling
sub_serie = fuzz()
This code is working, but is slow :
%timeit fuzz()
1 loop, best of 3: 8.45 s per loop
I can't play a lot with the window because it is related to temperature measurement accuracy.
My sample size is currently 300000, but it could increase to a couple of millions in near future.
Have you any idea how to optimize/speed up this code ?
Maybe you have an other idea how to make a downsampling which has physical sense ?
Maybe there is a solution directly with bokeh server ? Idealy dependent from user's zoom ?