3

I have a data frame with mixed data types (integer, character, and logical) which I'm trying to cluster with daisy.

I'm using:

gower_dist <- daisy(relchoice, metric = "gower")

and getting:

Error in daisy(relchoice, metric = "gower") : 
invalid type character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 
21, 29, 31, 32invalid type character for column numbers 3, 4, 5, 7, 8, 10, 
13, 14, 15, 16, 21, 29, 31, 32invalid type character for column numbers 3, 
4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type character for 
column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type 
character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 
32invalid type character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 
16, 21, 29, 31, 32invalid type character for column numbers 3, 4, 5, 7, 8, 
10, 13, 14, 15, 16, 21, 29, 31, 32invalid type character for column numbers 
3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type character for 
column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type 
character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 
32

Would love some help with this.

Gilad Brandes
  • 31
  • 1
  • 3
  • I'm also having a similar issue and found this post while trying to reproduce this blog post: https://towardsdatascience.com/clustering-on-mixed-type-data-8bbd0a2569c3 – Tomas Oct 18 '18 at 13:08

1 Answers1

6

I was able to fix this problem by converting categorical fields to a factor datatype, for example:

df$job <- as.factor(df$job)
Tomas
  • 675
  • 1
  • 9
  • 18