3

I am trying to run clustering exercise in R. The algorithm that I used is apcluster(). The script that I used is:

s1        <- negDistMat(df, r=2, method="euclidean")
apcluster <- apcluster(s1)

My data set is having around 0.1 million rows. When I ran the script, I got the following error:

Error in simpleDist(x[, sapply(x, is.numeric)], sel, method = method, : negative length vectors are not allowed

When I searched online, I found out that negative length vector error occurs due to the memory limit of my RAM. My question is if there is any workaround to run apcluster() on my dataset with 0.1 million rows with the available RAM, or am I missing something that I will need to take care while running apcluster in R?

I have a machine with 8 GB of RAM.

gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • On data sets of this size, you don't want to use algorithms that need O(n²) memory and supposedly even O(n³) time. Instead, try the `dbscan` package, for example. – Has QUIT--Anony-Mousse Jul 07 '17 at 07:04

1 Answers1

1

The standard version of affinity propagation implemented in the apcluster() method will never ever run successfully on data of that size. On the one hand, the similarity matrix (s1 in your code sample) will have 100K x 100K = 10G entries. On the other hand, computation times will be excessive. I suggest you use apclusterL() instead.

UBod
  • 825
  • 7
  • 11