I'm using HashingVectorizer function from sklearn.feature_extraction.text but I do not understand how it works.
My code
from sklearn.feature_extraction.text import HashingVectorizer
corpus = [ 'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?']
vectorizer = HashingVectorizer(n_features=2**3)
X = vectorizer.fit_transform(corpus)
print(X)
My result
(0, 0) -0.8944271909999159
(0, 5) 0.4472135954999579
(0, 6) 0.0
(1, 0) -0.8164965809277261
(1, 3) 0.4082482904638631
(1, 5) 0.4082482904638631
(1, 6) 0.0
(2, 4) -0.7071067811865475
(2, 5) 0.7071067811865475
(2, 6) 0.0
(3, 0) -0.8944271909999159
(3, 5) 0.4472135954999579
(3, 6) 0.0
I read a lot of paper on the Hashing Trick, like this article https://medium.com/value-stream-design/introducing-one-of-the-best-hacks-in-machine-learning-the-hashing-trick-bf6a9c8af18f
I understand this article but do not see the relationship with the result obtained above.
Can you explain me with simple example how work HashingVectorizer please