0

I am using KPCA function of kernlab package for dimensionality reduction, I am using rpy2 to call the API from python. The problem is I am getting different output for same data when running the my python script on different number of CPU cores each time. I am using linux command "taskset" or "numactl" to run my script from terminal. For example, for 2 runs:

taskset -c 1-3 python run.py
taskset -c 1-5 python run.py

The output of above two runs would be completely different. While each of them are reproducible in itself, like it I run with 3 cores as in the 1st command, 10 times..the output will be same for all 10 times.. similarly for the 2nd command with 5 cores. But why are their outputs are different from each other? This becomes an issue since its impacting my classification performance.

Edit: Indeed I also noticed this exact same behaviour when using scikit learn kpca. Is there anything common and fundamental here regarding KPCA that I am missing ?

Please help.

pranay25
  • 11
  • 4
  • why the `r` tag? – Wimpel Jul 21 '21 at 13:36
  • 1
    `kpca` is a CRAN package I think. It looks like an R question to me, the thin Python wrapper might not matter at all, but @pranay25 should try to make an MCE by demonstrating the same with a demo dataset using R alone. – krassowski Jul 21 '21 at 14:31
  • @Wimpel I am trying to use an R function here, which is "kpca" belonging to the package "kernlab" in R. The only python here is that I am calling it from a python environment using python code.. Library rpy2 enables us to call python and R libraries from one another. The behaviour I am seeing is specific to R library kpca that's why I tagged it to r. – pranay25 Jul 23 '21 at 12:27
  • @krassowski yes its an R questions, I am sorry have I used a wrong tag ? I beleive I used an R one, let me know if I used wrong tag, i will re post it with right tag. Sure I will try to give an MCE. Thanks. – pranay25 Jul 23 '21 at 12:29

0 Answers0