I am trying to create a distance matrix to run the DBSCAN algorithm for clustering purposes. The final distance matrix has 174,000 X 174,000 entries that are all floating numbers between 0 and 1. I have the individual lists (all 174,000 of them) saved with numbers saved as int in them, but when trying to consolidate into an array, I keep running out of memory.
Is there a way to compress the data (I have tried hdf5, but that also seems to struggle) that can deal with such a large data set?