I'm trying to use UMAP for dimensionality reduction on some embeddings. However, I encounter the following error when my dataset has more than 5k rows:
ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types numpy.dtype[float32]
below is my code
import numpy as np
import pandas as pd
import umap
import hdbscan
embeddings = my_embedder.encode(
data_df.normalization.values, show_progress_bar=False
)
umap_embeddings = umap.UMAP(
n_neighbors=np.min([5, data_df.shape[0]]),
n_components=3,
metric='cosine',
random_state=17
).fit_transform(embeddings)
Library versions:
numpy: 1.24.4
umap-learn: 0.5.3
pandas: 1.5.3
hdbscan: 0.8.33
numba: 0.55.1
I even tried downgrading version of numpy to 1.20.3 but that too didn't work.
I am using poetry for dependency management.