How to apply clustering on sentences embeddings?

Question

I would like to create a summary with the major points of the original document. To do this, I made sentences embeddings with a Universal Sentence Encoder(https://tfhub.dev/google/universal-sentence-encoder/2). After, I would like apply clustering on my vectors.

I've tried with the library sklearn:

import numpy as np
from sklearn.cluster import KMeans

n_clusters = np.ceil(len(encoded)**0.5)
kmeans = KMeans(n_clusters=n_clusters)
kmeans = kmeans.fit(encoded)

But I get an error message:

'numpy.float64' object cannot be interpreted as an integer'

Thank you @AjayPandya but I have an other error message like "only size-1 arrays can be converted to Python scalars" — Eva Rolin, Jul 24 '19 at 12:03
you can use like kmeans.astype(int) for more read this answer https://stackoverflow.com/a/36680545/3514144 :) — Ajay Pandya, Jul 24 '19 at 12:08

score 1 · Answer 1 · answered Oct 08 '19 at 13:08

1

The problem is caused in this line:

n_clusters = np.ceil(len(encoded)**0.5)

kmeans expects to receive an integer as the number of clusters so simply add:

n_clusters = int(np.ceil(len(encoded)**0.5))

answered Oct 08 '19 at 13:08

d9ngle

1,303
3
13
30

How to apply clustering on sentences embeddings?

1 Answers1