0

Assume we have a DF with hundreds of linear model parameters, including slope m and y-intercept b, as well as upper-limits for integration up_lim.

  tmp_df <- tibble(m = rnorm(1:1000, mean = 1, sd = 1),
                   b = rnorm(1:1000, mean = 3, sd = 0.5),
                   up_lim = rnorm(1:1000, mean = 11, sd = 4))

My goal is to row-wise integrate over x, from 0 to up_lim using a simple linear model:

integrand <- function(x) { m * x + b }

The result should be stored in a new column in tmp_df. I did some searching online and I am aware of the non-vector nature of the integrate function, but could not translate any of the discussion/solutions that I found to my case. My best solution was to loop, which works on a few hundred integrations but crashes my 12 Core MacBook (even after I tried multi-core support) when I feed it my full data set (> 1 million rows):

  lapply(c("foreach", "doParallel"),
         library, character.only = TRUE)

  n <- nrow(tmp_df)

  registerDoParallel(numCores)
  
  tmp_df$Fs_linear <- 
  foreach (i = 1:n, .combine = rbind) %dopar% {

  integrate(
        function(x) { tmp_df$m[i] * x + tmp_df$b[i] },
        lower = 0,
        upper = tmp_df$up_lim[i])$value
  }
  
  stopImplicitCluster()

Is there an elegant/resource-efficient way to accomplish this? I would be incredibly thankful for any pointers.

WieselMB
  • 1
  • 1
  • Well, this is an interesting test of resource allocation in R ... but as for getting to an actual result, my advice is to figure out the integral once, for general arguments, and then plug numbers from the df into that to get `tmp_df$Fs_linear`. – Robert Dodier Jun 11 '21 at 00:10
  • Thanks Robert. I did some optimization and I got it to work somehow. I think the issue was me using `system.time` wrapped around the `foreach` loop. After I removed it, it worked. However, my question about doing this outside a loop remains so I will leave this post up. – WieselMB Jun 11 '21 at 00:39

0 Answers0