2

I am using the R programming language and learning about kohonen package by following this tutorial. The kohonen R package allows the user to run Kohonen Networks (also called SOM - Self Organizing Maps), a type of unsupervised machine learning algorithm used in data visualization.

I ran the following code and produced the below plots:

#load libraries
library(kohonen) #fitting SOMs
library(RColorBrewer) #colors, using predefined palettes

#process data
iris_complete <-iris[complete.cases(iris),] #only complete cases... the iris dataset floats around in the sky with diamonds.
iris_unique <- unique(iris_complete) # Remove duplicates
iris.sc = scale(iris_unique[, 1:4])

#run the SOM
iris.grid = somgrid(xdim = 10, ydim=10, topo="hexagonal", toroidal = TRUE)
set.seed(33) #for reproducability
iris.som <- som(iris.sc, grid=iris.grid, rlen=700, alpha=c(0.05,0.01), keep.data = TRUE)

#make plots (3 different plots)
plot(iris.som, type="count")

plot(iris.som, type="dist.neighbours", 
     palette.name=grey.colors, shape = "straight")

var <- 1 #define the variable to plot
plot(iris.som, 
     type = "property", 
     property = getCodes(iris.som)[,var], 
     main=colnames(getCodes(iris.som))[var], 
     palette.name=terrain.colors)

from https://imgur.com/a/fQlv74X

From here, I am trying to modify these plots so that they are more recognizable. I am trying to add a "label" (a number from 1-100) to each circle so that is easier to identify each circle:

from https://imgur.com/a/GxuPp5J

I am not sure if there is a straightforward way to place a number on each corresponding circle. Looking at the som() function in the kohonen package (https://www.rdocumentation.org/packages/kohonen/versions/2.0.19/topics/som), it seems it is possible to determine which observation belongs to which circle:

#determine which circle each observation belongs to
a = iris.som$unit.classif

#pull the original data
b = iris.som$data

#combine both of them into one frame
c = rbind(a,b)

But I am not sure if it is possible to "superimpose" these numbers on to the corresponding circles. Does anyone know if this can be done?

Update:

I tried the following code:

iris_unique$ID <- seq_along(iris_unique[,1]) 
plot(iris.som, type="mapping", bg = rgb(colour4), shape = "straight", 
     border = "grey", labels = iris_unique[,6])

or:

library(plotly)

plot1 = plot(iris.som, type="mapping", bg = rgb(colour4), shape = "straight", 
             border = "grey", labels = iris_unique[,6])

plotly_plot = ggplotly(plot1)

But I don't think that this is correct.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

2

You can find the coordinates of the center of each circle or hexagon in iris.som$grid$pts. One way to put the labels on the coordinates is to assign a name to each circle by using dimnames(), and then assign the names as the labels on the corresponding circle's coordinates by using text().

iris.som$grid$pts
#          x         y
#  [1,]  1.5 0.8660254
#  [2,]  2.5 0.8660254
#  [3,]  3.5 0.8660254
#  [4,]  4.5 0.8660254
#  [5,]  5.5 0.8660254
#  up to [100,] 10.0 8.6602540

# Assign a name for each coordinate. For example, "1", "2", "3", etc

dimnames(iris.som$grid$pts) = list(as.character(1:100), c("x","y"))  

#iris.som$grid$pts
#       x         y
#1    1.5 0.8660254
#2    2.5 0.8660254
#3    3.5 0.8660254
#4    4.5 0.8660254
#5    5.5 0.8660254

# Put the names on the plot 
text(iris.som$grid$pts, dimnames(iris.som$grid$pts[[1]]))

# Do the same steps for other plots
plot(iris.som, type="dist.neighbours", 
     palette.name=grey.colors, shape = "straight")
text(iris.som$grid$pts, dimnames(iris.som$grid$pts[[1]]))

Note that if you use RStudio, you may need to adjust the width of your plotting pane to display all the labels properly.

enter image description here enter image description here

Abdur Rohman
  • 2,691
  • 2
  • 7
  • 12
  • thank you for your reply! this is perfect! the only thing I wanted to ask: is it possible to replace this line "dimnames(iris.som$grid$pts) = list(as.character(1:100), c("x","y")) " with this line "iris.som$unit.classif"? The SOM algorithm assigns each data point (from the iris file) into one of these "circles". I am trying to have the corresponding number from "iris.som$unit.classif" match the number on the circle. Is it possible to do this? Thank you for all your help! – stats_noob Jan 23 '21 at 16:43
  • here is a related question: https://stackoverflow.com/questions/65864333/r-identifying-points-by-color – stats_noob Jan 23 '21 at 21:08
  • Suppose I want to see which observations are in "circle 91" (based on the visual output). How would you do this? – stats_noob Jan 27 '21 at 00:36
  • @Noob Sorry for being very late. It's really difficult to determine which observations belong to which circle. My best guess is to get the index (location) of the number (e.g. "91") in "iris.som$unit.classif", and then check the corresponding index in "iris.sc" – Abdur Rohman Aug 10 '21 at 01:55