3

I am curious to know that when using the Standardised feature in a H2O model in R how does it work when scoring out new data.

I know that when it standardises on a training set is sets the mean to 0 and standard deviation to 1 based on the mean and standard deviation of the training data but what does it do with new data?

Does it standardise based on the training data mean and standard deviation or does it standardise based on the new data being scored?

1 Answers1

1

The score function applies the same mapping used to standardize the training data to the test dataset. This is handled automatically by H2O.

Erin LeDell
  • 8,704
  • 1
  • 19
  • 35
  • Thanks Erin, that was my guess otherwise the coefficients wouldn't be as interpretable. I guess I just have to keep an eye on each feature so they don't change too much overtime. Also do you know if there is function that will extract these mappings for me? i.e. the mean and standard deviation used to standardise each feature or do i just write a function that does it on the raw data ... mean(x) and sd(x). Basically I am wanting to move the model closer to my data in a database and write the function for the model manually using sql. – Mark Aitkin Aug 15 '17 at 01:45
  • No, these methods are not exposed via the H2O client APIs (that I'm aware of). You can turn this off and do the operations by hand (see the `standardize` arg in GLM & DL; the other algs don't warp the features), but if you're going to use H2O for modeling, it's easiest to let H2O handle this automatically. – Erin LeDell Aug 15 '17 at 03:17
  • I have since discovered that h2o produces both standardized and non-standardized coefficients. the non-standardized ones can be used on non-standardized data! – Mark Aitkin Oct 24 '17 at 09:37