0

I am trying to use UMAP through umap-learn. And I am running 2,000,000 cells with 16 dimensions as my training dataset and another 1,000,000 cells with 16 dimensions as my testing dataset. However, when I add another 100,000 cells to my testing dataset(so it becomes 1,100,000 cells), the layout of the UMAP changed completely. I am wondering if there is anyway I can fix the layout of UMAP when adding only a small fraction of cells to the dataset?

I have tried the following code

data = np.array(np.vstack([training_array, testing_array]), dtype=np.float64)
embedding = umap.UMAP(random_state=42).fit_transform(data)
Progman
  • 16,827
  • 6
  • 33
  • 48
  • What does "layout" mean? Do you mean the global structure of the lower dimensional data changes when you add more data? The local structure? That's the point of dimension reduction algorithms. – erip Mar 26 '22 at 12:55
  • Yes, I would like to preserve the global structure when add more data. – rachelyyyqq Mar 28 '22 at 03:21

0 Answers0