Can we refit or fit in in parts clustering algorithms?

Question

I want to cluster big data set (more than 1M records).
I want to use dbscan or hdbscan algorithms for this clustering task.

When I try to use one of those algorithms, I'm getting memory error.

Is there a way to fit big data set in parts ? (go with for loop and refit every 1000 records) ?
If no, is there a better way to cluster big data set, without upgrading the machine memory ?

score 1 · Accepted Answer · answered Apr 15 '21 at 09:05

If the number of features in your dataset is not too much (below 20-25), you can consider using BIRCH. It's an iterative method that can be used for large datasets. In each iteration it builds a tree with only a small sample of data and put each instance into clusters.

Can we refit or fit in in parts clustering algorithms?

1 Answers1