0

I am trying to apply Hierarchical Clustering for Time Series in order to identify the states with similar behaviors in the time series for residential_percent_change_from_baseline. I get the dendrogram but the index i get in the x axis are just numbers and I want the states names. my data looks like this: Data

And this is some part of my code

data <- dataset
#Convert to factor
cols <- c("country_region_code", "country_region", "sub_region_1", "iso_3166_2_code")
data[cols] <- lapply(data[cols], factor)
sapply(data, class)
data$date <- as.Date(data$date)
summary(data)

#Data preparation
n <- 10
s <- sample(1:100, n)
i <- c(s,0+s,   279+s,  556+s,  833+s,  1110+s, 1387+s, 1664+s, 1941+s, 2218+s, 2495+s, 2772+s, 3049+s, 3326+s, 3603+s, 3880+s, 4157+s, 4434+s, 4711+s, 4988+s, 5265+s, 5542+s, 5819+s, 6096+s, 6373+s, 6650+s, 6927+s, 7204+s, 7481+s, 7758+s, 8035+s, 8312+s, 8589+s, 8866+s)
d <- data[i,3:4]
d$residential <- data[i,11]
d[,2] =NULL
str(d)

pattern <- c(rep('Mexico', n),
             rep('Aguascalientes', n),
             rep('Baja California',n),
             rep('Baja California Sur',n),
             rep('Campeche',n),
             rep('Coahuila',n),
             rep('Colima',n),
             rep('Chiapas',n),
             rep('Chihuahua',n),
             rep('Durango',n),
             rep('Guanajuato',n),
             rep('Guerrero',n),
             rep('Hidalgo',n),
             rep('Jalisco',n),
             rep('México City',n),
             rep('Michoacan',n),
             rep('Morelos',n),
             rep('Nayarit',n),
             rep('Nuevo León',n),
             rep('Oaxaca',n),
             rep('Puebla',n),
             rep('Querétaro',n),
             rep('Quintana Roo',n),
             rep('San Luis Potosí',n),
             rep('Sinaloa',n),
             rep('Sonora',n), 
             rep('Tabasco',n),
             rep('Tamaulipas',n),
             rep('Tlaxcala',n),
             rep('Veracruz',n),
             rep('Yucatán',n),
             rep('Zacatecas.',n))
d <- data.matrix(d)
distance <- dist(d, method = 'euclidean')
hc <- hclust(distance, method="ward.D")
plot(hc, cex=.7, hang = -1, col='blue', labels=pattern)

I get this dendrogram when I don't specify labels dendrogram with numeric labels But when I do I get this error

Error in graphics:::plotHclust(n1, merge, height, order(x$order), hang, : invalid dendrogram input

I hope somebody can help me, I am little bit tired of this

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
Seaatm
  • 1
  • Hey, please make a self-contained reproducible example for your problem. Also, please review the very detailed vignette of dendextend, it should support your use-case: https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html – Tal Galili May 08 '21 at 15:52

1 Answers1

-1

Maybe it will work with an alternative to the base r plot function. Try ggdendroplot. It should display the labels on the axis. You will need ggplot2 for this.

devtools::install("nicolash2/ggdendroplot")
library(ggdendroplot)
library(ggplot2)

ggplot() + geom_dendro(hc)

If you want to modify it (turn it, color it, etc.) check out the github page: https://github.com/NicolasH2/ggdendroplot

NicolasH2
  • 774
  • 5
  • 20
  • Sorry for asking, but what's the point of this package? Why not just use dendextend::as.ggdend, after setting the object's parameters? – Tal Galili May 08 '21 at 14:54
  • 1
    The package is just meant as a quick and easy solution. For sure it doesn't have as many options as dendextend. I guess you can just as well use denextend for this problem, I just wanted to provide one possible way. Seeing as this post does not have any other answer I don't see why that hurts. For the same reason I am also not sure why it is downvoted, even if it is not an ideal answer. – NicolasH2 May 10 '21 at 21:19