The library and corresponding documentation is following -- yes i read everything and being able to "run" on my own codes.
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.LSHForest.html
But the results do not really make sense to me, so i went through the example (which is included at the previous webpage as well)
>>> from sklearn.neighbors import LSHForest
>>> X_train = [[5, 5, 2], [21, 5, 5], [1, 1, 1], [8, 9, 1], [6, 10, 2]]
>>> X_test = [[9, 1, 6], [3, 1, 10], [7, 10, 3]]
>>> lshf = LSHForest()
>>> lshf.fit(X_train)
LSHForest(min_hash_match=4, n_candidates=50, n_estimators=10,
n_neighbors=5, radius=1.0, radius_cutoff_ratio=0.9,
random_state=None)
>>> distances, indices = lshf.kneighbors(X_test, n_neighbors=2)
>>> distances
array([[ 0.069..., 0.149...],
[ 0.229..., 0.481...],
[ 0.004..., 0.014...]])
>>> indices
array([[1, 2],
[2, 0],
[4, 0]])
so i just try to verify the example by finding the nearest neighbors for the the three testing sets [9, 1, 6], [3, 1, 10], [7, 10, 3]
Say searching nearest neighbors for [9,1,6] (by using Euclidean distance), the closest training points are [5, 5, 2] and [6, 10, 2] (which i think the indices would [0.4]) -- that is significantly different to the results [1,2]
the distances also completely off the topic by the simple math calculation, my excel sheet is attached
thanks again for your time and help