I am trying to use KNN algorithm from spark 2.2.0. I am wondering how I should set my bucket length. The record count/number of features varies, so I think it is better to set length by some conditions. How should I set the bucket length for better performance? I rescaled all the features in vector into 0 to 1.
Also, is there any way to guarantee KNN algorithm to return minimum number of elemnets? I found out that sometimes number of elements inside the bucket is smaller than queried k, and I might want at least one or two neighbors as result.
Thanks~