0

I have an R function that takes some input data that contains missing values, uses Random Forest imputation to impute those values (through the rfImpute function from RandomForest package) and then goes through a RF importance calculation to identify the relative importance of variables (through ranger from the ranger package). The function has the seed 2018.

When I run the function using R with set.seed(2018), I get a set of results. When running the exact same function, the exact same input data and using the exact same seed in PL/R (using Navicat) the results are different.

I am having a really hard time understanding what could be causing this issue as everything is the exact same between the two (except one is R and the other is PL/R). For some input datasets, the results are equivalent but for others they are not. What could the problem be?

Note: I am not able to provide a simple example since my data is confidential.

Grint
  • 101
  • 2
  • do the results within R and PL/R stay the same with each run? We had some changing RF-results with each run despite having set the same seed in each run. It turned out, that the order of the Input-Data (which came from an external source) varied. The split in train and test data was based on a random selection of indices from 1 to NROW(data)... So maybe you are having something similar like this? – TinglTanglBob Sep 18 '18 at 14:53
  • @TinglTanglBob the results they the same with each run... I am ordering the data once it gets inputted so I don't think that's the problem? – Grint Sep 18 '18 at 15:01
  • Do you get the same random numbers when you call `set.seed(2018); runif(10)` within PL/R and local R? – Ralf Stubner Sep 18 '18 at 16:05
  • 1
    When retrieving the data set, is the order of the records fixed (and the same) – joop Sep 18 '18 at 17:20
  • @RalfStubner yep, I get the same numbers... so confused by this – Grint Sep 19 '18 at 15:21
  • @joop I am ordering the data inside the R Script, so not sure how that could be the cause? – Grint Sep 19 '18 at 15:21
  • 1
    In that case you will have to provide a minimal example. Instead of your confidential data you can produce a similar random dataset. – Ralf Stubner Sep 19 '18 at 18:51

0 Answers0