I have a question on feature normalization/standardisation (scaling) for anomaly detection / novelty detection using autoencoders. Typically in ML problems, we split the train/test sets. Fit normal/standard scaler on train and use that to transform ( Not fit_transform ) the test data. But how is it in anomaly detection/ novelty detection where we use only the 'normal' data ( not any 'anomalies' ) for training a anomaly detector ? Here the training data will not represent the test data, as it is learning only on 'normal' data so that it will make reconstruction error when given 'anomaly' data. Here should we fit the normalization on train data and use that to transform the anomalies ? I think that is not proper. Is it ok to scale the train and test data separately if it produces explanatory results ?
Asked
Active
Viewed 209 times
-1
-
Your test set should contain both normal and anomalous datapoints - and the "normal" points must resemble your training data. – Jon Nordby Oct 31 '22 at 12:52
1 Answers
0
Your test set should contain both normal and anomalous datapoints - and the "normal" points must resemble your training data. So you scale/normalize on your training data, like normally.

Jon Nordby
- 5,494
- 1
- 21
- 50
-
I agree the test set should contain both normal and anomalous data. But my question is - is it proper to use the scaler that is fit on training (normal only ) data to transform the test anomaly data ? Because they do not resemble each other. Please remember this is anomaly detection where we train only using normal data, not classification where we train with both classes. Anyhow, I tried that and it fails in recognising normal and anomaly data. – Rajaram Nov 01 '22 at 10:12
-
Yes, it is proper to fit on train. The normal part of the test set must resemble the training set. The anomalies do not, naturally. I think your issue is elsewhere – Jon Nordby Dec 06 '22 at 11:38