I have input of raw feed from several sources that don't produce values at static rates and need to resample and normalize it for further processing. Values are resampled to 500ms using average to aggregate multiple values. Then forward fill is applied to fill missing values with last value and back fill to fill possible missing values in the beginning of data.
#raw feed
time value source
09:30:00.230 2 B
09:30:00.417 3 B
09:30:00.417 1 A
09:30:00.653 3 A
09:30:01.450 2 B
09:30:01.887 5 A
09:30:02.653 5 B
09:30:02.763 3 B
09:30:02.967 5 B
09:30:03.107 6 A
09:30:03.670 6 B
#resampled to 500ms intervals using average
time A B
09:30:00.000 NULL 2
09:30:00.500 2 3
09:30:01.000 NULL NULL
09:30:01.500 NULL 2
09:30:02.000 5 NULL
09:30:02.500 NULL 5
09:30:03.000 6 4
09:30:03.500 NULL 6
#ffill+bfill
time A B
09:30:00.000 2 2
09:30:00.500 2 3
09:30:01.000 2 3
09:30:01.500 2 2
09:30:02.000 5 2
09:30:02.500 5 5
09:30:03.000 6 4
09:30:03.500 6 6
I used following code, but I doubt it is efficient way to use Deedle and resulting dataframe contains duplicate values due to full outer join, so now I need so way to aggregate them or split them to series and resample them again? Please advise if there's a better way to meet the requirements.
private void Resample(IList<(DateTime time, double value, string source)> rawSource)
{
var sourceASeries = rawSource.Where(x => x.source.ToLowerInvariant() == "A").Select(x => KeyValue.Create(x.time, x.value)).ToSeries();
var sourceBSeries = rawSource.Where(x => x.source.ToLowerInvariant() == "B").Select(x => KeyValue.Create(x.time, x.value)).ToSeries();
var sourceAResampled = sourceASeries.ResampleUniform(dt => dt.RoundMs(500), dt => dt.RoundMs(500).AddMilliseconds(500),
Lookup.ExactOrSmaller);
var sourceBResampled = sourceBSeries.ResampleUniform(dt => dt.RoundMs(500), dt => dt.RoundMs(500).AddMilliseconds(500),
Lookup.ExactOrSmaller);
var df = Frame.FromColumns(new[] { sourceAResampled, sourceBResampled });
df = df.FillMissing(Direction.Forward).FillMissing(Direction.Backward);
}
In Python using Pandas it works fine for me using following code:
import Bs as pd
A_vals = vals.where(vals['Source']==' A', inplace=False).rename(columns={"Value":" A"}).drop(['Source'], axis=1)
B_vals = vals.where(vals['Source']=='B', inplace=False).rename(columns={"Value":"B"}).drop(['Source'], axis=1)
A_vals= A_vals.resample('100ms').mean().ffill().bfill()
B_vals=B_vals.resample('100ms').mean().ffill().bfill()
result=pd.concat([ A_vals,B_vals], axis=1)