-4

I have a dataset of 590000 records after preprocessing and i wanted to find clusters out of it and it contains string data (for now assume i have only one column with 590000 unique values in dataset). Also i am using custom defined distance measure and needed to calculate the distance matrix of size 590000*590000. Using some partition logic i created the distance matrix but cannot merge those partitions into one big distance matrix due to memory constarints. Does anyone have any sort of idea to resolve it ?? I picked DBSCAN for it. Is there any way to use deep learning methodologies?? any other ideas

Vas
  • 918
  • 1
  • 6
  • 19

1 Answers1

0

Use a manageable sample first.

Because I doubt the results will be good enough to warrant any effort on scaling a method that does not work anyway.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194