I want to cluster 1,5 million of chemical compounds. This means having 1.5 x 1.5 Million distance matrix...
I think I can generate such a big table using pyTables but now - having such a table how will I cluster it?
I guess I can't just pass pyTables object to one of scikit learn clustering methods...
Are there any python based frameworks that would take my huge table and do something useful (lie clustering) with it? Perhaps in distributed manner?