1

To explain, lets say I am checking 9 nearest neighbors and doing classification on a handwritten digits dataset. First instance in test set has five nearest neighbors of class '4' and four neighbors of class '9'. Second test instance has eight neighbors of class '4' and one of class '9'. Obviously second test instance is classified as '4' with much greater certainty than the first one. How to express this with a function, and how to take distances into account?

I would also like to implement this to other classifiers. Any chance there is a C/C++ library with this functionality I could use, for any type of classifier?

Mika
  • 153
  • 1
  • 7

2 Answers2

0

You should try using silhouette values and plots. It is available in the cluster package for the R language.

KoenVdB
  • 293
  • 2
  • 12
  • Thanks, I forgot to write I am using C, but this may also be usefull information. – Mika Jul 08 '14 at 07:36
  • @KoenVdb This does not answer the question. You might consider adding it as a comment. Check [here](http://stackoverflow.com/help/how-to-answer) for more details. – eliasah Jul 08 '14 at 07:38
0

Naive answer: normalise counts to give you posterior probabilities. Use weighted counts, with weights corresponding to similarities (inverse of distances) to take distance into account.

Better idea: look at kernel density estimation as a more formalised version of this.

Ben Allison
  • 7,244
  • 1
  • 15
  • 24
  • Thank you, your answer put me on the right track. I guess this is what I was looking for: http://en.wikipedia.org/wiki/Probabilistic_classification – Mika Jul 10 '14 at 08:42