I already have text data that is categorized like what is shown below:
main_category sub_category_level1 sub_category_level2
plants fruit apples
plants fruit oranges
plants fruit
plants veggies carrots
plants veggies
plants veggies onions
Most of the tutorials I've read online calculate a distance matrix, and then use the hclust function to cluster the data, but my data is already characterized and it's purely text data.
I'm also unsure of how to handle missing values.