AgglomerativeClustering on precomputed Sparse Matrix

Asked Jun 27 '19 at 18:10

Active Jun 27 '19 at 22:06

Viewed 879 times

In my current approach, I have

from scipy.sparse import csr_matrix
from sklearn.cluster import AgglomerativeClustering
import pandas as pd

s = pd.DataFrame([[0.8, 0. , 3. ],
       [1. , 1. , 2. ],
       [0.3, 3. , 4. ]], columns=['dist', 'v1', 'v2'])
sparseD = csr_matrix((1-s['dist'], (s['v1'].values, s['v2'].values)), shape=(N, N))
agg = AgglomerativeClustering(n_clusters=None, affinity='precomputed', linkage='complete', distance_threshold=.25)
agg.fit_predict(sparseD)

The last line raises

TypeError: cannot take a sparse matrix.

If I cast the data toarray, the code works and produces the expected output, but uses a lot of memory and is slow: on the real data size: 61K x 61K.

I am wondering if there is another library (or scikit API) that can do the same linkage clustering on a precomputed, sparse Distance matrix -- if there were no entry for a given (element1, element2) pair, the API would not link them and everything else would be the same.

edited Jun 27 '19 at 22:06

desertnaut

57,590
26
140
166

asked Jun 27 '19 at 18:10

Sam Shleifer

1,716
2
18
29

1

Who or what raises the `TypeError`? You won't get much help if you are stingy with the debugging information. – hpaulj Jun 27 '19 at 19:56
Sorry! The last line raises `TypeError` – Sam Shleifer Jun 27 '19 at 22:00
There are sklearn operations that do work with sparse matrices. More than most packages. Apparently this isn't one of those. Function docs should be clear on the matter. – hpaulj Jun 27 '19 at 22:15
1

Totally! My question is whether there is a similar function that works on sparse matrices. – Sam Shleifer Jun 28 '19 at 23:22
@Sam Sheleifer, did you ever find out the answer. I am after the same issue, but no solution so far... Thank you. – Memin Dec 12 '21 at 01:19

AgglomerativeClustering on precomputed Sparse Matrix

0 Answers0