k nearest neighbours in multithread program

Question

Given a training set and a test point T, which needs to be classified. If I divide the training set into n parts, then run knn algorithm (k=1) on each part. After that I compare results from each part. Would it give me the same results as if I run 1-nn through the whole training set? For example: n = 4. Divide training set into 4 parts After run 1-nn algorithm on 4 parts of the training set, I've got point A from part 1, point B from part 2, C from part 3 and D from part 4. After that, can I compare distance from T to A, B, C and D to work out which class T belongs to?

score 0 · Answer 1 · answered Mar 15 '17 at 23:58

0

I hope I understood the problem...

If you want to classify considering only the nearest point (1-nn), this would be OK.

However, to perform k-nn classification, considering the nearest point of each of k groups is not the same as considering the k nearest points, unless they happen to be in different groups. You should at least keep k points for each of the n groups and then pick the nearest k points among the n*k selected.

answered Mar 15 '17 at 23:58

rrobby86

1,356
9
14

Sorry the question may not be clear enough. It's always 1-nn. It's about if it's correct to run 1-nn on (for example) 4 non-overlapped parts of the training set and then comparing 4 locally nearest neighbours to pick out 1 globally nearest neighbours. – gunner308 Mar 16 '17 at 00:42
Ok, this is fine, as in the end you get the "best" point of the training set as if you iterated through them in a single batch – rrobby86 Mar 16 '17 at 09:24
I thought that as well. However, my multithreaded machine learning program shows the greatest accuracy when running the training set as a whole with 1 thread. The accuracy decreases as I increase the number of running threads (each thread runs on a different part of the training set). That's why I started doubting myself. – gunner308 Mar 16 '17 at 21:55
That sounds strange... Maybe it depends from some specific aspect of the implementation (I was thinking about the use of indexing structures, but should not be a problem). – rrobby86 Mar 17 '17 at 07:33
One more thing, I used mlpack library for the implementation. I divided the training set into few parts and run each part through mlpack 1-nn algorithms to get local nearest neighbours. – gunner308 Mar 20 '17 at 11:19
I don't know mlpack, but should not matter which implementation you use IMHO. – rrobby86 Mar 20 '17 at 13:24

k nearest neighbours in multithread program

1 Answers1