0

I'm trying to use UMAP for dimensionality reduction on some embeddings. However, I encounter the following error when my dataset has more than 5k rows:

ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types numpy.dtype[float32]

below is my code

import numpy as np
import pandas as pd
import umap
import hdbscan

embeddings = my_embedder.encode(
    data_df.normalization.values, show_progress_bar=False
)

umap_embeddings = umap.UMAP(
    n_neighbors=np.min([5, data_df.shape[0]]),
    n_components=3,
    metric='cosine',
    random_state=17
).fit_transform(embeddings)

Library versions:

numpy: 1.24.4 
umap-learn: 0.5.3 
pandas: 1.5.3 
hdbscan: 0.8.33
numba: 0.55.1

I even tried downgrading version of numpy to 1.20.3 but that too didn't work.

I am using poetry for dependency management.

  • Why did you try downgrading numpy? Did you find something in web search suggesting this? Keep in mind must numpy readers of this post know nothing about `umap` A full error message might help, as well as links to previous questions. Sounds like you using a relatively unmaintained library. My first search result is https://github.com/lmcinnes/pynndescent/issues/163 – hpaulj Aug 12 '23 at 14:45
  • @hpaulj That's the only error I get. Some other links suggested downgrading numpy. And Yes I did follow the above link but it didn't work. I again visited same link and modified the pynndescent's ```correct alternative cosine``` method inside site packages, it didn't work first but when tried downgrading **numpy to 1.22.x** and numba as mentioned in post it worked fine thankfully. However this is just temporary solution as every time I rebuilt my docker/singularity instance I will need to do the same changes again. – Harvindar Singh Garcha Aug 12 '23 at 20:25

0 Answers0