I am trying to perform a grid search to find coefficients that maximize the correlation between a linear combination of x's and y. My function takes a data.frame where each column is the thetas to test for that iteration.
corr_grid_search <- function(thetas, modeling_df) {
# thetas = as.vector(thetas)
coeff1 = modeling_df$penalty1 / thetas[1]
coeff2 = modeling_df$penalty2 / thetas[2]
coeff3 = modeling_df$penalty3 / thetas[3]
coeff4 = modeling_df$penalty4 / thetas[4]
coeff5 = modeling_df$penalty5 / thetas[5]
coeff6 = modeling_df$penalty6 / thetas[6]
coeff7 = modeling_df$penalty7 / thetas[7]
coeff8 = modeling_df$penalty8 / thetas[8]
coeff9 = modeling_df$penalty9 / thetas[9]
coeff10 = modeling_df$penalty10 / thetas[10]
df = data.frame(coeff1, coeff2, coeff3, coeff4, coeff5, coeff6, coeff7, coeff8, coeff9, coeff10)
pp_1 = modeling_df$x1 / df$coeff1
pp_2 = modeling_df$x2 / df$coeff2
pp_3 = modeling_df$x3 / df$coeff3
pp_4 = modeling_df$x4 / df$coeff4
pp_5 = modeling_df$x5 / df$coeff5
pp_6 = modeling_df$x6 / df$coeff6
pp_7 = modeling_df$x7 / df$coeff7
pp_8 = modeling_df$x8 / df$coeff8
pp_9 = modeling_df$x9 / df$coeff9
pp_10 = modeling_df$x10 / df$coeff10
recip = 1/df[, c('coeff1', 'coeff2', 'coeff3',
'coeff4', 'coeff5', 'coeff6',
'coeff7', 'coeff8', 'coeff9', 'coeff10')]
recip = as.data.frame(lapply(recip, function(x) replace(x, is.infinite(x), NA)))
df = data.frame(pp_1, pp_2, pp_3, pp_4, pp_5, pp_6, pp_7,
pp_8, pp_9, pp_10)
weighted_x = rowSums(df, na.rm=T) /
rowSums(recip, na.rm=T)
cor(weighted_x[!is.na(weighted_x)],
modeling_df[!is.na(weighted_x),]$y)
}
I have it running with lapply() like so:
lapply(blah, corr_grid_search, modeling_df)
But am trying to parallelize it and having trouble. The two methods I have tried use the parallel and future.apply libraries, but neither has worked:
library(future.apply)
plan(multisession)
cors = future_lapply(blah, corr_grid_search, modeling_df)
library(parallel)
cl = makeCluster(32)
clusterExport(cl=cl, varlist=c("modeling_df"))
cors = parLapply(cl, blah, corr_grid_search, modeling_df)
Something is going wrong with both of them because they take horrendously long, 2-3 orders of magnitude slower. What am I doing wrong here?