12

I am using Clustream algorithm and I have figured out that I need to normalize my data. I decided to use min-max algorithm to do this, but I think in this way the values of new coming data objects will be calculated differently as the values of min and max may change. Do you think that I'm correct? If so, which algorithm shall I use?

T.Sh
  • 390
  • 2
  • 16

2 Answers2

7

Instead to compute the global min-max based on the whole data, you can use a local nomarlization based on a sliding window (e.g. using just the last 15 secconds of data). This approach is very commom to compute Local Mean Filter on signal and image processing.

I hope it can help you.

  • Hi! I know it's a little bit late, but currently I find myself with the same problem. The thing is that, if you use a sliding window, current micro clusters that are represented with a normalization considering old min and max values, are in a different 'scale' than the new elements for which you got different min and max values. How do you deal with this? I mean, you normalize the new data points but the current micro clusters managed by the algorithm are normalized considering the old values ... you cannot send the new elements to the algorithm because that would be inconsistent! – onofricamila Mar 25 '20 at 03:20
0

When normalizing stream data you need to use the statistical properties of the train set. During streaming you just need to cut too big/low values to a min/max value. There is no other way, it's a stream, you know.

But as a tradeoff, you can continuously collect the statistical properties of all your data and retrain your model from time to time to adapt to evolving data. I don't know Clustream but after short googling: it seems to be an algorithm to help to make such tradeoffs.

fatih
  • 1,395
  • 10
  • 9