Spark causes a memory error when running DBSCAN source using scala. How can we solve this?

Question

We used 100,000 kits. The version of spark is 1.6.1 and scala is 2.1.0. How can I fix memory errors and get good results?

Try ELKI with indexing instead of Spark. – Erich Schubert Aug 23 '17 at 16:28 — Erich Schubert, Aug 23 '17 at 16:28

score 3 · Answer 1 · answered Aug 23 '17 at 22:38

The various DBSCAN addons for Spark are all problematic.

Confer this report:

Neukirchen, Helmut. "Survey and Performance Evaluation of DBSCAN Spatial Clustering Implementations for Big Data and High-Performance Computing Paradigms." (2016).

For JVM languages like Scala, it should be easy to call e.g. ELKI and get a quite good performance.

Spark causes a memory error when running DBSCAN source using scala. How can we solve this?

1 Answers1