-3

First, let's take a look of mydata:

head(mydata,10)

   LONGITUDE LATITUDE
1   121.7779  39.0476
2   121.5210  38.8771
3   121.6259  38.9224
4   121.5907  38.8980
5   121.5865  38.8816
6   121.5808  38.9121
7   121.5806  38.8843
8   121.5907  38.8992
9   121.7586  39.0380
10  121.6061  38.9035

dim(mydata)
[1] 716213      2

 summary(mydata)
 LONGITUDE        LATITUDE    
 Min.   :121.1   Min.   :38.72  
 1st Qu.:121.6   1st Qu.:38.91  
 Median :121.6   Median :38.93  
 Mean   :121.6   Mean   :38.95  
 3rd Qu.:121.6   3rd Qu.:38.99  
 Max.   :122.2   Max.   :39.40 

The whole size of mydata is less than 20Mb.

Now, I want to conduct the cluster with mydata, I use Mclust() in package mclust, which is know as EM (expectation maximization).

fit_em <- Mclust(mydata)

To my surprise, after I input the code, I have waited for over 1 hour to get the result, is an Error. The detailed info is:

Error: cannot allocate vector of size 1910.9 Gb
In addition: Warning messages:
1: In hcVVV(data = c(121.7779, 121.521, 121.6259, 121.5907, 121.5865,  :
  NAs introduced by coercion to integer range
2: In double(ld) :
  Reached total allocation of 8191Mb: see help(memory.size)
3: In double(ld) :
  Reached total allocation of 8191Mb: see help(memory.size)
4: In double(ld) :
  Reached total allocation of 8191Mb: see help(memory.size)
5: In double(ld) :
  Reached total allocation of 8191Mb: see help(memory.size)

What is going wrong with mydata and my code, what should I do if I want to conduct the cluster with mydata?

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Ling Zhang
  • 281
  • 1
  • 3
  • 13

1 Answers1

0

The way Mclust is implemented, it will use quadratic memory.

Do the math. 716213 * 716213 * 8 bytes per double.

This is not necessary, but the default operation. You can use initialization to choose a less expensive initialization. But to cluster large geo data sets, you should have a look at ELKI, too. If I'm not mistaken, its EM implementation should only need linear memory.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194