5

I'm using Mclust function (from mclust package) to perform a mixed gaussian glustering. The data set is composed of 21000+ rows and 10 columns.

I got the following error:

Error in svd(shape.o, nu = 0) : infinite or missing values in 'x'

What is strange is that: 1) I've checked for NaN, Inf, and alike and there is none 2) if I run the model for 9 variables it works well, when I add one var i got the error. I've tried with a set of different additional variables, but got the same error...

do you guys have an idea of what is going wrong? Much appreciated.

EDIT on Variables

> str(data_scaled[data_subset, model_variables])
'data.frame':   21304 obs. of  12 variables:
 $ PROD_ALL_OR_NOTHING_PERC: num  -0.064 -0.064 -0.064 -0.064 0.141 ...
 $ PROD_CASH_3_PERC        : num  -0.212 -0.212 -0.212 1.303 0.686 ...
 $ PROD_CASH_4_PERC        : num  -0.18 -0.18 -0.18 1.09 8.75 ...
 $ PROD_EINSTANTS_PERC     : num  -0.502 0.68 2.329 -0.582 -0.582 ...
 $ PROD_FANTASY_5_PERC     : num  -0.6517 -0.5562 -0.4928 0.0267 -0.6517 ...
 $ PROD_GEORGIA_5_PERC     : num  -0.0563 -0.0563 -0.0563 -0.0563 6.3148 ...
 $ PROD_KENO_PERC          : num  2.208 1.125 -0.664 0.624 -0.664 ...
 $ PROD_MEGA_MILLION_PERC  : num  -0.687 -0.687 -0.687 -0.523 -0.687 ...
 $ PROD_POWERBALL_PERC     : num  -0.886 -0.886 -0.514 -0.682 -0.886 ...
 $ AVG_WAGER               : num  -0.136 -0.422 -0.416 -0.467 -0.582 ...
 $ DEPOSIT_AMOUNT          : num  0.3984 0.0928 -0.1745 0.8043 1.2674 ...
 $ DEPOSIT_NUM             : num  0.485 0.955 -0.22 1.659 3.773 ...

> summary(data_scaled[data_subset, model_variables])
 PROD_ALL_OR_NOTHING_PERC PROD_CASH_3_PERC  PROD_CASH_4_PERC  PROD_EINSTANTS_PERC PROD_FANTASY_5_PERC
 Min.   :-0.06402         Min.   :-0.2122   Min.   :-0.1801   Min.   :-0.5819     Min.   :-0.6517    
 1st Qu.:-0.06402         1st Qu.:-0.2122   1st Qu.:-0.1801   1st Qu.:-0.5819     1st Qu.:-0.6517    
 Median :-0.06402         Median :-0.2122   Median :-0.1801   Median :-0.5819     Median :-0.5146    
 Mean   : 0.00000         Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000     Mean   : 0.0000    
 3rd Qu.:-0.06402         3rd Qu.:-0.2122   3rd Qu.:-0.1801   3rd Qu.: 0.1934     3rd Qu.: 0.3021    
 Max.   :33.08348         Max.   : 7.3222   Max.   :11.5193   Max.   : 2.8354     Max.   : 3.6404    
 PROD_GEORGIA_5_PERC PROD_KENO_PERC    PROD_MEGA_MILLION_PERC PROD_POWERBALL_PERC   AVG_WAGER       
 Min.   :-0.05627    Min.   :-0.6644   Min.   :-0.6873        Min.   :-0.8861     Min.   :-0.62837  
 1st Qu.:-0.05627    1st Qu.:-0.6644   1st Qu.:-0.6873        1st Qu.:-0.8861     1st Qu.:-0.45222  
 Median :-0.05627    Median :-0.6644   Median :-0.4302        Median :-0.4078     Median :-0.29270  
 Mean   : 0.00000    Mean   : 0.0000   Mean   : 0.0000        Mean   : 0.0000     Mean   : 0.00000  
 3rd Qu.:-0.05627    3rd Qu.: 0.4445   3rd Qu.: 0.3513        3rd Qu.: 0.6892     3rd Qu.: 0.07956  
 Max.   :60.46933    Max.   : 2.2766   Max.   : 4.3167        Max.   : 2.4615     Max.   :31.21876  
 DEPOSIT_AMOUNT     DEPOSIT_NUM     
 Min.   :-0.1746   Min.   :-0.2198  
 1st Qu.:-0.1746   1st Qu.:-0.2198  
 Median :-0.1746   Median :-0.2198  
 Mean   : 0.0000   Mean   : 0.0000  
 3rd Qu.:-0.1746   3rd Qu.:-0.2198  
 Max.   :36.2089   Max.   :23.5029  
lilloraffa
  • 1,367
  • 3
  • 17
  • 22
  • can you edit your post to add `str(data)` or `head(data)` ? – Mamoun Benghezal Apr 20 '15 at 14:15
  • Your question is answered here: http://stackoverflow.com/questions/21423375/r-svd-function-infinite-or-missing-values-in-x – Seth Jul 18 '15 at 02:53
  • The response in Seth's linked answer deals with constant value columns; the summary in this question indicates that this is a different problem. I suspect MClust is running an internal subset that _results_ in a constant valued column, probably due to the source data having values that are _mostly_ zero. I have a similar problem and would like to avoid dropping the rare column as it is important. – Chipmonkey Nov 12 '15 at 18:28

1 Answers1

-1

r seems to dislike numbers that are too close to zero. I found that if you multiplied the parameter by 10 or more I could avoid the error

BICCtrSD = mclustBIC(Ipsative)

fitting ...
  |=========                                        |  18%Error in svd(shape.o, nu = 0) : infinite or missing values in 'x'

BICCtrSD = mclustBIC(Ipsative*10)

fitting ...
  |=================================================| 100%

But don't forget that you transformed the data when you look at outcome statistics.

This may be more of a hack than a solution per se.

cal
  • 13
  • 4
  • Explanation required. Please add it to prevent further downvoting and to get you out of the "low quality review corner". – ZF007 Jun 19 '19 at 23:15