-2

I already have text data that is categorized like what is shown below:

main_category   sub_category_level1 sub_category_level2
plants  fruit   apples
plants  fruit   oranges
plants  fruit   
plants  veggies carrots
plants  veggies 
plants  veggies onions

Most of the tutorials I've read online calculate a distance matrix, and then use the hclust function to cluster the data, but my data is already characterized and it's purely text data.

I'm also unsure of how to handle missing values.

ProfLonghair
  • 81
  • 1
  • 8

1 Answers1

0

Dendrograms, by definition, need a height, i.e. a similarity value.

What you are looking for is a tree. Where the root splits at the main category, the branches then at the subcategory, etc.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194