2

I have understood the principle of hashing trick and use it when normalizing my data (word content). The results I get from my hash process are in the range of [0;N]. As we know that model training is more efficient on data in the [0;1] range, I then try to normalize the hash data. That's when I'm not sure of my logic.

Shouldn't I recover hash data directly from the [0;1] range? In this case I don't know how to do it... or should I use a normalization function, as I do know? In this case, which one would be recommended?

Here is my hash process: I'm using the hashCode()java function, which gives me results in range [0;N].

int hashedString = word.toString().hashCode() % N + N;

And about the normalization process: I'm using currently the Normalize.Standardize of DeepLearning4j, which gives me range [-2;2] approximately.

mmcookie
  • 39
  • 2
  • Add 2 to all terms, then divide by 4. – Elliott Frisch Oct 14 '19 at 16:40
  • thanks @ElliottFrisch for ur comment, your method would work, wouldn't there be something more optimal than processing the data by the hash, then applying the normalization function and finally processing the data again? – mmcookie Oct 15 '19 at 07:26
  • I think you misunderstood about training efficiency. Why should data fall in the range `[0, 1]`? A datum with a reasonably small norm (e.g., `[-2, 2]`) must be fine. – ghchoi Oct 15 '19 at 07:50
  • @GyuHyeonChoi I'm really new to ML, any article I've read about normalization recommend range [0;1] but effectively there are other specificities (mean 0 and standard deviation of one) https://machinelearningmastery.com/how-to-improve-neural-network-stability-and-modeling-performance-with-data-scaling/ – mmcookie Oct 15 '19 at 08:17
  • Yes. Normalizing activated values to probability distribution is an effective and easy way to regularize norms; simply diving by sum of all. It is for preventing values from bursting through deep calculation process. If `[0, 1]` is more efficient than other ranges, there is no reason for `[-1, 0]` to be less efficient than `[0, 1]`. They contradict to each other. – ghchoi Oct 15 '19 at 08:54
  • got it, thanks @GyuHyeonChoi – mmcookie Oct 15 '19 at 09:25

0 Answers0