0

I'm using the hclust function in a large script applied to a df like in this example:

HClust <- hclust(d = dist(model.matrix(~-1 + A + B + C + D, df))^2, method = "centroid")

I would like to specify only once the variables in the df, eg. MgO, Zn, CaO... and when I call hclust() I would like to have them automatically.

I've tried creating a vector which will include the dataframe variables in the format that I will use for the hclust call. But the resulting dendrogram is not correct.

  vars_for_clust <- paste(colnames(df),"+") 

which gives the following:

vars_for_clust
[1] "A+" "B+" "C+"

and used this vector in the hclust call:

  HClust <- hclust(d = dist(model.matrix(~-1 + vars_for_clust, df))^2, method = "centroid")

but something went wrong because even if it does not give an error, the resulting dendrogram is not correct (all the vertical lines are equal)

Thanks!!

Sample data in: https://github.com/esteful/kaixo

Esteful
  • 23
  • 4

1 Answers1

1

There are two problems here: 1. your use of paste and 2. your use of vars_for_clust as an argument to model.matrix.

To get what you need, you should construct the entire formula as a string and then convert it to a formula - like this:

(FormString <- paste(c("~ -1", colnames(df)), collapse=" + "))
[1] "~ -1 + A + B + C"
HClust <- hclust(d = dist(model.matrix(as.formula(FormString), df))^2, method = "centroid")
G5W
  • 36,531
  • 10
  • 47
  • 80