0

I'm trying to get this example to work with version 3 of the Kohnonen R library. https://clarkdatalabs.github.io/soms/SOM_NBA

I've tried to update the code there as came up with this, but it's not correct. I get most of the same results as the example, but in the last plot I can't see any errors in classification, so I'm doing something wrong. I think I know about where my mistake is, but I'm not sure what it might be.

# https://clarkdatalabs.github.io/soms/SOM_NBA
# https://github.com/clarkdatalabs/soms/issues?q=is%3Aopen+is%3Aissue


library(kohonen)
library(RColorBrewer)
library(RCurl)

NBA <- read.csv(text = getURL("https://raw.githubusercontent.com/clarkdatalabs/soms/master/NBA_2016_player_stats_cleaned.csv"), 
            sep = ",", header = T, check.names = FALSE)

colnames(NBA)

NBA.measures1 <- c("FTA", "2PA", "3PA")
NBA.SOM1 <- som(scale(NBA[NBA.measures1]), grid = somgrid(6, 4, "rectangular"))
plot(NBA.SOM1)

colors <- function(n, alpha = 1) {
rev(heat.colors(n, alpha))
}

plot(NBA.SOM1, type = "counts", palette.name = colors, heatkey = TRUE)

par(mfrow = c(1, 2))
plot(NBA.SOM1, type = "mapping", pchs = 20, main = "Mapping Type SOM")
plot(NBA.SOM1, main = "Default SOM Plot")

NBA.SOM2 <- som(scale(NBA[NBA.measures1]), grid = somgrid(6, 6, "hexagonal", toroidal=TRUE) )

par(mfrow = c(1, 2))
plot(NBA.SOM2, type = "mapping", pchs = 20, main = "Mapping Type SOM")
plot(NBA.SOM2, main = "Default SOM Plot")
plot(NBA.SOM2, type = "dist.neighbours", palette.name = terrain.colors)

NBA.measures2 <- c("FTA", "FT", "2PA", "2P", "3PA", "3P", "AST", "ORB", "DRB", 
               "TRB", "STL", "BLK", "TOV")

training_indices <- sample(nrow(NBA), 200)
NBA.training <- scale(NBA[training_indices, NBA.measures2])
NBA.testing <- scale(NBA[-training_indices, NBA.measures2], center = attr(NBA.training, 
"scaled:center"), scale = attr(NBA.training, "scaled:scale"))

NBA.SOM3 <- xyf(NBA.training, classvec2classmat(NBA$Pos[training_indices]), 
            grid = somgrid(13, 13, "hexagonal", toroidal = TRUE), rlen = 100, 
user.weights = 0.5)

pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing, whatmap = 1)
table(NBA[-training_indices, "Pos"], pos.prediction$prediction[[2]])

NBA.SOM4 <- xyf(scale(NBA[, NBA.measures2]), classvec2classmat(NBA[, "Pos"]), 
            grid = somgrid(13, 13, "hexagonal", toroidal = TRUE), rlen = 300, 
user.weights = 0.7)

par(mfrow = c(1, 2))
plot(NBA.SOM4, type = "codes", main = c("Codes X", "Codes Y"))
NBA.SOM4.hc <- cutree(hclust(dist(getCodes(NBA.SOM4, 2))), 5)
add.cluster.boundaries(NBA.SOM4, NBA.SOM4.hc)

bg.pallet <- c("red", "blue", "yellow", "purple", "green")

# make a vector of just the background colors for all map cells

#I think my error is in this line...
position.predictions <- classmat2classvec(predict(NBA.SOM4)$unit.predictions[[2]])


base.color.vector <- bg.pallet[match(position.predictions, levels(NBA$Pos))]

# set alpha to scale with maximum confidence of prediction
bgcols <- c()
max.conf <- apply(getCodes(NBA.SOM4, 2), 1, max)
for (i in 1:length(base.color.vector)) {
  bgcols[i] <- adjustcolor(base.color.vector[i], max.conf[i])
}

par(mar = c(0, 0, 0, 4), xpd = TRUE)
plot(NBA.SOM4, type = "mapping", pchs = 21, col = "black", bg = 
bg.pallet[match(NBA$Pos, 
levels(NBA$Pos))], bgcol = bgcols)

legend("topright", legend = levels(NBA$Pos), text.col = bg.pallet, bty = "n", 
   inset = c(-0.03, 0))
Oleg
  • 303
  • 2
  • 14

1 Answers1

0

The kohonen package builds the model by initializing its nodes property using some randomly selected training members. Therefor, it is very rarely one would get the exact final nodes arrangement with that of someone else does. Nevertheless, the property values will still be the same, only the arrangement is different. At least, that is what in my opinion. To obtain the exact arrangement, two kohonen models should be run under the same random seed number generator, i.e. using set.seed() function. From the code that you have already provided, the variable 'position.prediction' contains some NA values. I think if you add one more line to omit the NA values after the assignment to the 'position.prediction', the nodes background would be all filled with an already predefined color palette. So the script will be:

# this is your script
position.predictions <- classmat2classvec(predict(NBA.SOM4)$unit.predictions[[2]])

# add this below and continue
position.predictions <- na.omit(position.predictions)

I think that the NA values are returned as a result of the inability of the kohonen to recognize the pattern of its inputs.

h45
  • 206
  • 1
  • 6