Note : Using LSH for a Nearest Neighbor Query
Assuming the data set has 5 features (f1,f2,..,f5) Where the first 2 are Numerical and 3 are categorical. And one or many of these categories maybe something like username or subject which would be quite large to encode.
If we use Mixeducledian Distance as a distace measure and use it in the Hash Function what should be or how do I select the Random Projections for the function ?
Its ok if i have to change the HashFunction.
Sample data
f1,f2,f3,f4,f5
89,43,aa,bq,wb
23,67,cd,zd,cs
98,32,aa,wb,cc
10,20,aq,zd,wb