3

I want to find the boundary-decision function in order to classify my data. Here is an example of them.

 "Distance","Dihedral","Categ"
    4.083,82.267,C
    4.132,87.073,C
    4.713,-80.999,C
    3.427,-48.144,NC
    3.663,96.994,C
    3.99,71.919,C
    3.484,78.684,C

So far I've got the knn model but I'd like to plot the non-linear decision boundary. In the examples I have searched, there are some variables that I don't have a clue where to use them or what do they mean. I'm talking about this example I found in "The Elements of Statistical Learning" book

library(ElemStatLearn)
require(class)
x <- mixture.example$x
g <- mixture.example$y
xnew <- mixture.example$xnew
mod15 <- knn(x, xnew, g, k=15, prob=TRUE)
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
px1 <- mixture.example$px1
px2 <- mixture.example$px2
prob15 <- matrix(prob, length(px1), length(px2))
par(mar=rep(2,4))
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
        "15-nearest neighbour", axes=FALSE)
points(x, col=ifelse(g==1, "coral", "cornflowerblue"))
gd <- expand.grid(x=px1, y=px2)
points(gd, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))
box()

What exactly are px1 and px2? Do I need similar variables for my particular case?

Thank you very much for your help!

  • I think px1 and px2 are simply vector describing the grid for the new data, i.e. the points along x and y axis where you have new data. – Andrie Nov 29 '16 at 10:33

1 Answers1

1

I have reworked and annotated the example to make it clear what happens.

The example constructs a test set that is simply an expanded grid that convers the entire test set. Thus px1 is a vector that describes the x-component of the grid of the test data, and px2 is similar, but for y. Then xnew is the result of expand.grid().

Try the following code, where this should be reasonably clear. I have also modified the k-value, and provided an easy way to construct xnew using intervals of your choice.

library(ElemStatLearn)
require(class)

# Use the training data from mixture.example
x <- mixture.example$x
g <- mixture.example$y

# Construct a test grid using the extent of the training data
xx_range <- round(range(x[, 1]), 1)
xy_range <- round(range(x[, 2]), 1)

nnn <- 0.1
px1 <- seq(xx_range[1], xx_range[2], by = nnn) # vector with x extent
px2 <- seq(xy_range[1], xy_range[2], by = nnn) # vector with y extent
xnew <- as.matrix(expand.grid(px1, px2))       # matrix of new values

# Train a model
k <- 10
mod15 <- knn(x, xnew, g, k=k, prob=TRUE)
prob <- attr(mod15, "prob")
prob <- ifelse(mod15=="1", prob, 1-prob)
prob15 <- matrix(prob, length(px1), length(px2))

# Plot the results
par(mar=rep(2,4))
contour(px1, px2, prob15, levels=0.5, labels="", xlab="", ylab="", main=
          sprintf("%d-nearest neighbour", k), axes=FALSE)
points(x, col=ifelse(g==1, "coral", "cornflowerblue"))
points(xnew, pch=".", cex=1.2, col=ifelse(prob15>0.5, "coral", "cornflowerblue"))
box()

enter image description here

Andrie
  • 176,377
  • 47
  • 447
  • 496