I have a dataset of 590000 records after preprocessing and i wanted to find clusters out of it and it contains string data (for now assume i have only one column with 590000 unique values in dataset). Also i am using custom defined distance measure and needed to calculate the distance matrix of size 590000*590000. Using some partition logic i created the distance matrix but cannot merge those partitions into one big distance matrix due to memory constarints. Does anyone have any sort of idea to resolve it ?? I picked DBSCAN for it. Is there any way to use deep learning methodologies?? any other ideas
Asked
Active
Viewed 49 times
1 Answers
0
Use a manageable sample first.
Because I doubt the results will be good enough to warrant any effort on scaling a method that does not work anyway.

Has QUIT--Anony-Mousse
- 76,138
- 12
- 138
- 194