I have understood the principle of hashing trick and use it when normalizing my data (word content).
The results I get from my hash process are in the range of [0;N]
.
As we know that model training is more efficient on data in the [0;1]
range, I then try to normalize the hash data. That's when I'm not sure of my logic.
Shouldn't I recover hash data directly from the [0;1]
range? In this case I don't know how to do it... or should I use a normalization function, as I do know? In this case, which one would be recommended?
Here is my hash process: I'm using the hashCode()
java function, which gives me results in range [0;N]
.
int hashedString = word.toString().hashCode() % N + N;
And about the normalization process:
I'm using currently the Normalize.Standardize
of DeepLearning4j, which gives me range [-2;2]
approximately.