1

Hi my frinds the observation is the following. I don't know what the problem is.

When I am making clusters with the hclust function, the labels of the object that it creates are lost if the way I subset the data frame is "incorrect".

This is the data frame.

set.seed(1234)
x <- rnorm(12,mean=rep(1:3,each=4),sd=0.2)
y <- rnorm(12,mean=rep(c(1,2,1),each=4),sd=0.2)
z <- as.factor(sample(c("A","B"),12,replace=T))
df <- data.frame(x=x,y=y,z=z)
plot(df$x,df$y,col=z,pch=19,cex=2)

This chunck of code returns NULL for the labels.

df1 <- df[c("x","y")]
d <- dist(df1)
cluster <- hclust(d)
cluster$labels   #NULL

This chunck of code returns NULL as well.

df2 <- df[,1:2]
d <- dist(df2)
cluster <- hclust(d)
cluster$labels   #NULL

This chunck of code does not return NULL.

df3 <- df[1:12,1:2]
d <- dist(df3)
cluster <- hclust(d)
cluster$labels   #Character Vector

This has represented a problem for me because I have some codes that uses this information.

As you can see, the data frames are identical.

identical(df1, df2)  #True 
identical(df1, df3)  #True
identical(df2, df3)  #True
alebj88
  • 71
  • 1
  • 5

0 Answers0