2

I am trying to run many linear regressions and diagnostics over them and to speed things up I am using the doParallel package in the R programming language.

I have come across though an interesting issue. Although I have seen performance improvement -as expected- the usage of the CPU is not consistent.

For example, if I do a run of my code, CPU utilization may be 30-40% for all cores.

If I run again my code, it is possible that CPU utilization goes up to 90% without me changing anything in the meantime.

In both cases, I am not running anything else on the same time.

Is there an explanation why the cores would be used 30% one time and 90% another time without me changing anything?

I am running Windows XP, 4GB or RAM, and have an Inter(R) Xeon(R) CPU X5650 @ 3.67GHz.

My code looks something like:

results <- foreach(i=seq(1:regressions), .combine='rbind', .export=c('lmResults','lmCSig', 'lmCVar')) %dopar%
{model <- lm(as.formula(as.character(dat[i])), data=df) lmResults(model)     }
svick
  • 236,525
  • 50
  • 385
  • 514
stratar
  • 119
  • 7
  • When the CPU is reportedly more active, does it finish faster or does it take about the same amount of time in either case? I know there are some things I do in parallel where it doesn't go to 100% CPU utilization but those are usually not super intensive activities. It might be something related to virtual memory but that's just a guess. – Dean MacGregor Jul 23 '15 at 17:11
  • Thanks for the comment. In both cases, I am running the same calculations (i.e. same dataset, same number of regressions, same variables... everything the same). And Yes, when CPU is at 90% levels my calculations finish much much faster (amazingly fast... to be honest). – stratar Jul 23 '15 at 17:13
  • That supports my idea that windows is moving your data from physical memory to virtual memory and so then disk IO becomes your bottleneck hence the cpu doesn't get very busy. That is a total guess though. Update your OS and get more RAM ;) – Dean MacGregor Jul 23 '15 at 18:03
  • Dean, this is a brilliant guess! It seems that this is indeed the problem. When I free up as much RAM as I can, CPU gets very busy and finishes the computations very quickly, obviously because there is no need for virtual memory at that point.When my usage of RAM was high and I tried to run the same computations, I was having the CPU working at ~30%. It sounds so obvious now, but couldn't think of it in the first place. Unfortunately I am stuck with XP in a corporate environment, so there is not much I can do at the moment. – stratar Jul 24 '15 at 07:18

1 Answers1

1

Your system has become memory constrained either within R itself or from other programs taking up memory resources. When this happens, Windows will use the hard drive for virtual memory. This has the advantage of allowing more operations but has the disadvantage of being slower, much much slower. I struggled to find a way to lock R's memory in physical RAM instead of virtual RAM but realized that that wouldn't help because when you launch a parallel task, R will launch new processes so even if you could lock your main R instance's memory into physical memory and make your other running programs use virtual memory it wouldn't help you because the new processes are the ones being put in virtual memory.

It is interesting that you're running Windows XP in a corporate environment as Microsoft has stopped support of XP as of April 8, 2014. That being said you'd benefit from more RAM more so than you would benefit from upgrading the OS version except of course that Windows XP has a 4GB limit so you can't add more RAM without also upgrading OS.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72