0

Check two relevant references Extracting Class Probabilities from SparkR ML Classification Functions and sparkR 1.6: How to predict probability when modeling with glm (binomial family)

I'm just wondering whether there is any method to get these done without converting the SparkDataFrame back to an R data.frame via either as.data.frame or collect. Cuz it seems impossible when there is millions of data...

Nemo
  • 1
  • converting a 10^7 x 10 matrix to a data frame takes 0.638 seconds on my laptop (8G RAM) – Omry Atia Dec 18 '18 at 05:51
  • It's really weird that it seems take forever to convert a 30,000,000 x 10 SparkR DataFrame to an R data frame although there's 20G executive memory. And it has no difficulty to convert a subsample of it, i.e. 500 x 10. Really desperated. – Nemo Dec 19 '18 at 13:35

0 Answers0