-2

I am trying to convert a data.frame to a daisy matrix from the CRAN's Cluster package in R. I have a dataset of 13109 observations with 9 categorical variables.

I got two types of errors about NAs being introduced by coercion and no missing arguments to min/max. Why am I getting this error?

I do not have any NA values in the data.frame. Here's information on my dataset:

> str(df4)
'data.frame':   13109 obs. of  9 variables:
 $ Age               : chr  "55-64" "55-64" "55-64" "55-64" ...
 $ Gender            : chr  "Female" "Female" "Male" "Male" ...
 $ HouseholdIncome   : chr  "50k-75k" "150k-175k" "150k-175k" "150k-175k" ...
 $ MaritalStatus     : chr  "Single" "Married" "Married" "Married" ...
 $ PresenceofChildren: chr  "No" "Yes" "Yes" "Yes" ...
 $ HomeOwnerStatus   : chr  "Own" "Rent" "Rent" "Rent" ...
 $ HomeMarketValue   : chr  "350k-500k" "500k-1mm" "500k-1mm" "500k-1mm" ...
 $ Occupation        : chr  "White Collar Worker" "Professional" "Professional" "Professional" ...
 $ Education         : chr  "Completed High School" "Completed College" "Completed College" "Completed College" ...

Here's proof that the NA values where coerced: I tryed performing the PAM clustering function, but got an error saying NA values not being allowed.

>library(cluster)
>#Create dissimilarity matrix
>#Gower coefficient for finding distance between mixed variable
>daisy4 <- daisy(df4, metric = "gower", type = list(ordratio = c(1:9)))

> warnings()
Warning messages:
1: In data.matrix(x) : NAs introduced by coercion
2: In data.matrix(x) : NAs introduced by coercion
3: In data.matrix(x) : NAs introduced by coercion
4: In data.matrix(x) : NAs introduced by coercion
5: In data.matrix(x) : NAs introduced by coercion
6: In data.matrix(x) : NAs introduced by coercion
7: In data.matrix(x) : NAs introduced by coercion
8: In data.matrix(x) : NAs introduced by coercion
9: In data.matrix(x) : NAs introduced by coercion
10: In min(x) : no non-missing arguments to min; returning Inf
11: In max(x) : no non-missing arguments to max; returning -Inf
12: In min(x) : no non-missing arguments to min; returning Inf
13: In max(x) : no non-missing arguments to max; returning -Inf
14: In min(x) : no non-missing arguments to min; returning Inf
15: In max(x) : no non-missing arguments to max; returning -Inf
16: In min(x) : no non-missing arguments to min; returning Inf
17: In max(x) : no non-missing arguments to max; returning -Inf
18: In min(x) : no non-missing arguments to min; returning Inf
19: In max(x) : no non-missing arguments to max; returning -Inf
20: In min(x) : no non-missing arguments to min; returning Inf
21: In max(x) : no non-missing arguments to max; returning -Inf
22: In min(x) : no non-missing arguments to min; returning Inf
23: In max(x) : no non-missing arguments to max; returning -Inf
24: In min(x) : no non-missing arguments to min; returning Inf
25: In max(x) : no non-missing arguments to max; returning -Inf
26: In min(x) : no non-missing arguments to min; returning Inf
27: In max(x) : no non-missing arguments to max; returning -Inf
28: In min(x) : no non-missing arguments to min; returning Inf
29: In max(x) : no non-missing arguments to max; returning -Inf

> k4answers <- pam(daisy4, 3, diss = TRUE)
Error in pam(daisy4, 3, diss = TRUE) : 
  NA values in the dissimilarity matrix not allowed.

Please let me know if I can provide more information.

EDIT: I solved my error. I read in the .csv file as a character. That's why it worked with the other dataset. Here's where I went wrong:

#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                   na.strings = "", stringsAsFactors=FALSE, head = TRUE)

Solution:

#Load Data
    Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                       na.strings = "", head = TRUE)
www
  • 38,575
  • 12
  • 48
  • 84
Scott Davis
  • 983
  • 6
  • 22
  • 43
  • 2
    You appear to be ignoring the help page that says daisy works on numeric matrices or dataframes. It apparently will handle factor variables but not character. – IRTFM Sep 22 '14 at 04:08
  • @BondedDust thank you for pointing that out. I forgot to include how the data was read in. I converted the data to characters previously. I took it out and `daisy` worked! – Scott Davis Sep 22 '14 at 04:24
  • 2
    @ScottDavis If you've solved your issue, please post your solution as an answer. Please include a brief but full explanation about what the problem was and what solution you found – Barranka Sep 22 '14 at 04:44
  • Or delete the question – Rich Scriven Sep 22 '14 at 05:19
  • @Barranka I posted the solution. – Scott Davis Sep 23 '14 at 04:25
  • @ScottDavis :) I recommend you post the solution *as an answer* (yes, you can answer your own question), and then accept your answer... it helps other users who may find a similar situation find your question, know that it has an answer and that answer worked for you. – Barranka Sep 23 '14 at 04:26

1 Answers1

1

Read the data in as factor variables instead of characters.

#Load Data
    Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                       na.strings = "", head = TRUE)

I had this solution in before and created an error.

#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv", 
                   na.strings = "", stringsAsFactors=FALSE, head = TRUE)
Scott Davis
  • 983
  • 6
  • 22
  • 43
  • You can convert your already imported data to factors; simply use `Store4$variable <- factor(Store4$variable)`. [Read the documentation for the `factor()` function](https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html) – Barranka Sep 25 '14 at 05:03