How to create colormap of confidence estimates for k-Nearest Neighbor Classification

Question

What I want:

To display the results of my simple classification algorithm (see below) as a colormap in python (the data is in 2D), where each class is assigned a color, and the confidence of a prediction anywhere on the 2D map is proportional to the saturation of the color associated with the class prediction. The image below sort of illustrates what I want for a binary (two class problem) in which the red parts might suggest strong confidence in class 1, whereas blue parts would speak for class 2. The intermediate colors would suggest uncertainty about either. Obviously I want the color scheme to generalize to multiple classes, so I would need many colors and the scale would then go from white (uncertainty) to very colorful color associated with a class.

illustration http://www.nicolacarlon.it/out.png

Some Sample Code:

My sample code just uses a simple kNN algorithm where the nearest k data points are allowed to 'vote' on the class of a new point on the map. The confidence of the prediction is simply given by relative frequency of the winning class, out of the k which voted. I haven't dealt with ties and I know there are better probabilistic versions of this method, but all I want is to visualize my data to show a viewer the chances of a class being in a particular part of the 2D plane.

import numpy as np
import matplotlib.pyplot as plt


# Generate some training data from three classes
n = 100 # Number of covariates (sample points) for each class in training set. 
mean1, mean2, mean3 = [-1.5,0], [1.5, 0], [0,1.5]
cov1, cov2, cov3 = [[1,0],[0,1]], [[1,0],[0,1]], [[1,0],[0,1]]
X1 = np.asarray(np.random.multivariate_normal(mean1,cov1,n))
X2 = np.asarray(np.random.multivariate_normal(mean2,cov2,n))
X3 = np.asarray(np.random.multivariate_normal(mean3,cov3,n))


plt.plot(X1[:,0], X1[:,1], 'ro', X2[:,0], X2[:,1], 'bo', X3[:,0], X3[:,1], 'go' )

plt.axis('equal'); plt.show() #Display training data


# Prepare the data set as a 3n*3 array where each row is a data point and its associated class
D = np.zeros((3*n,3))
D[0:n,0:2] = X1; D[0:n,2] = 1
D[n:2*n,0:2] = X2; D[n:2*n,2] = 2
D[2*n:3*n,0:2] = X3; D[2*n:3*n,2] = 3

def kNN(x, D, k=3):
    x = np.asarray(x)
    dist = np.linalg.norm(x-D[:,0:2], axis=1)
    i = dist.argsort()[:k] #Return k indices of smallest to highest entries
    counts = np.bincount(D[i,2].astype(int))
    predicted_class = np.argmax(counts) 
    confidence = float(np.max(counts))/k
    return predicted_class, confidence 

print(kNN([-2,0], D, 20))

DrV · Accepted Answer · 2014-06-27T07:51:22.130

So, you can calculate two numbers for each point in the 2D plane

confidence (0 .. 1)
class (an integer)

One possibility is to calculate your own RGB map and show it with imshow. Like this:

import numpy as np
import matplotlib.pyplot as plt

# color vector with N x 3 colors, where N is the maximum number of classes and the colors are in RGB
mycolors = np.array([
  [ 0, 0, 1],
  [ 0, 1, 0],
  [ 1, 0, 1],
  [ 1, 1, 0],
  [ 0, 1, 1],
  [ 0, 0, 0],
  [ 0, .5, 1]])

# negate the colors
mycolors = 1 - mycolors 

# extents of the area
x0 = -2
x1 = 2
y0 = -2
y1 = 2

# grid over the area
X, Y = np.meshgrid(np.linspace(x0, x1, 1000), np.linspace(y0, y1, 1000))

# calculate the classification and probabilities
classes = classify_func(X, Y)
probabilities = prob_func(X, Y)

# create the basic color map by the class
img = mycolors[classes]

# fade the color by the probability (black for zero prob)
img *= probabilities[:,:,None]

# reverse the negative image back
img = 1 - img

# draw it
plt.imshow(img, extent=[x0,x1,y0,y1], origin='lower')
plt.axis('equal')

# save it
plt.savefig("mymap.png")

The trick of making negative colors is there just to make the maths a bit easier to undestand. The code can of course be written much denser.

I created two very simple functions to mimic the classification and probabilities:

def classify_func(X, Y):
    return np.round(abs(X+Y)).astype('int')

def prob_func(X,Y):
    return 1 - 2*abs(abs(X+Y)-classify_func(X,Y))

The former gives for the given area integer values from 0 to 4, and the latter gives smoothly changing probabilities.

The result:

enter image description here

If you do not like the way the colors fade towards zero probability, you may always create some non-linearity which is the applied when multiplying with the probabilities.

Here the functions classify_func and prob_func are given two arrays as the arguments, first one being the X coordinates where the values are to be calculated, and second one Y coordinates. This works well, if the underlying calculations are fully vectorized. With the code in the question this is not the case, as it only calculates single values.

In that case the code changes slightly:

x = np.linspace(x0, x1, 1000)
y = np.linspace(y0, y1, 1000)
classes = np.empty((len(y), len(x)), dtype='int')
probabilities = np.empty((len(y), len(x)))
for yi, yv in enumerate(y):
    for xi, xv in enumerate(x):
    classes[yi, xi], probabilities[yi, xi] = kNN((xv, yv), D)

Also as your confidence estimates are not 0..1, they need to be scaled:

probabilities -= np.amin(probabilities)
probabilities /= np.amax(probabilities)

After this is done, your map should look like this with extents -4,-4..4,4 (as per the color map: green=1, magenta=2, yellow=3):

kNN map

To vectorize or not to vectorize - that is the question

This question pops up from time to time. There is a lot of information about vectorization in the web, but as a quick search did not reveal any short summaries, I'll give some thoughts here. This is quite a subjective matter, so everything just represents my humble opinions. Other people may have different opinions.

There are three factors to consider:

performance
legibility
memory use

Usually (but not always) vectorization makes code faster, more difficult to understand, and consume more memory. Memory use is not usually a big problem, but with large arrays it is something to think of (hundreds of megs are usually ok, gigabytes are troublesome).

Trivial cases aside (element-wise simple operations, simple matrix operations), my approach is:

write the code without vectorizations and check it works
profile the code
vectorize the inner loops if needed and possible (1D vectorization)
create a 2D vectorization if it is simple

For example, a pixel-by-pixel image processing operation may lead to a situation where I end up with one-dimensional vectorizations (for each row). Then the inner loop (for each pixel) is fast, and the outer loop (for each row) does not really matter. The code may look much simpler if it does not try to be usable with all possible input dimensions.

I am such a lousy algorithmist that in more complex cases I like to verify my vectorized code against the non-vectorized versions. Hence I almost invariably first create the non-vectorized code before optimizing it at all.

Sometimes vectorization does not offer any performance benefit. For example, the handy function numpy.vectorize can be used to vectorize practically any function, but its documentation states:

The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

(This function could have been used in the code above, as well. I chose the loop version for legibility for people not very familiar with numpy.)

Vectorization gives more performance only if the underlying vectorized functions are faster. They sometimes are, sometimes aren't. Only profiling and experience will tell. Also, it is not always necessary to vectorize everything. You may have an image processing algorithm which has both vectorized and pixel-by-pixel operations. There numpy.vectorize is very useful.

I would try to vectorize the kNN search algorithm above at least to one dimension. There is no conditional code (it wouldn't be a show-stopper but it would complicates things), and the algorithm is rather straight-forward. The memory consumption will go up, but with one-dimensional vectorization it does not matter.

And it may happen that along the way you notice that a n-dimensional generalization is not much more complicated. Then do that if memory allows.

Thanks a lot! This looks promising. Could you perhaps advise me on how to make my current function `kNN` which returns your `probabilties` and `classes` values work with the inputs `X` and `Y` from `meshgrid`? The way I have it now my function just accepts a single coordinate `x`... — Dipole, Jun 26 '14 at 19:50
It is now there. Also, please note that I added the missing `origin='lower'` to `imshow`. Otherwise the image will be upside down. Also note that the image scaling is slightly off (scale factor *n* / ( *n* +1) where *n* is the number of pixels. This can be fixed but it would have unnecessarily obfuscated the code, and usually no one notices that 1/1000 anyway. — DrV, Jun 26 '14 at 21:09
Just a follow up: what would you advise me to think about when creating code like this? Should I think from the start that the code should be vectorized for speed since we are computing the kNN algorithm for many points? — Dipole, Jun 27 '14 at 00:36
@Jack: Briefly: Yes, you probably should in this case, but see my edited answer for a bit more discussion. If you need more instructions on how to do it, try it yourself, and then create a new question on SO when you get stuck. Good luck! — DrV, Jun 27 '14 at 07:53

How to create colormap of confidence estimates for k-Nearest Neighbor Classification

What I want:

Some Sample Code:

1 Answers1

Linked