1

I am trying to calculate silhouette_score or silhouette_samples using a sparse matrix but getting the following error:

ValueError: diag requires an array of at least two dimensions

The sample code is as follows:

edges = [
(1, 2, 0.9),
(1, 3, 0.7),
(1, 4, 0.1),
(1, 5, 0),
(1, 6, 0),
(2, 3, 0.8),
(2, 4, 0.2),
(2, 5, 0),
(2, 6, 0.3),
(3, 4, 0.3),
(3, 5, 0.2),
(3, 6, 0.25),
(4, 5, 0.8),
(4, 6, 0.6),
(5, 6, 0.9),
(7, 8, 1.0)]

gg = nx.Graph()

for u,v, w in edges:
    gg.add_edge(u, v, weight=w)


adj = nx.adjacency_matrix(gg)
adj.setdiag(0)


from sklearn.metrics import silhouette_score, silhouette_samples

print(silhouette_score(adj, metric='precomputed', labels=labels))
silhouette_samples(adj, metric='precomputed', labels=labels)
Daniel F
  • 13,620
  • 2
  • 29
  • 55
Prateek Jain
  • 547
  • 1
  • 5
  • 16

1 Answers1

1

This is a bug. You should report it. Relevant code.

X, labels = check_X_y(X, labels, accept_sparse=['csc', 'csr'])

# Check for non-zero diagonal entries in precomputed distance matrix
if metric == 'precomputed':
    atol = np.finfo(X.dtype).eps * 100
    if np.any(np.abs(np.diagonal(X)) > atol):
        raise ValueError(
            'The precomputed distance matrix contains non-zero '
            'elements on the diagonal. Use np.fill_diagonal(X, 0).'
        )

Although the input checking explicitly accepts CSC/CSR matrices, if metric is 'precomputed' it drops X into numpy functions that don't work on sparse matrices.

CJR
  • 3,916
  • 2
  • 10
  • 23