0

Unlike a previous question about this, this case is different to that and that is why I'm asking. I have an already cleaned dataset containing 120 000 observations of 25 variables, and I am supposed to analyze it all through logistic regression and random forest. However, I get an error "cannot allocate vector of size 98 GB whereas my friend doesn't.

Summary says most of it. I even tried to reduce number of observations to 50 000 and number of variables in dataset to 15 (used 5 of them in regression) and it failed. However, I tried sending the script where i shortened the dataset to a friend, and she could run it. This is odd because I have a 64 bit system and 8 GB RAM, she has only 4 GB. So it appears that the problem lies with me.

pd_data <- read.csv2("pd_data_v2.csv")
split <- rsample::initial_split(pd_data, prop = 0.7)
train <- rsample::training(split)
test <- rsample::testing(split)

log_model <- glm(default ~ profit_margin + EBITDA_margin +   payment_reminders, data = pd_data, family = "binomial")
log_model

The result should be a logistic model where I can see coefficients and meassure it's accuracy, and make adjustments.

Aite97
  • 155
  • 1
  • 9
  • 1
    can you add an error message to you question. you might want to write a script that generates a file with fake data with which you can reproduce the error. – Bulat Oct 14 '19 at 21:15
  • Maybe try a fresh session of R and/or restarting your computer as a first step. Make sure any packages you're using are up to date too. – Mako212 Oct 14 '19 at 21:21
  • 2
    Generally speaking, there is a new package that claims to take care of the problem: https://github.com/xiaodaigh/disk.frame However, in your particular case I don't see what causes the problem since you say it works on other machines. Maybe share your `sessionInfo()` with us so you can get some more specific advice. Also some sample of your data could help (use `dput()`) – JBGruber Oct 14 '19 at 21:23

0 Answers0