I think I've completely misunderstood how foreach parallel operations work.
In the following example is foreach running 7 independant threads of foo(DF[i,])
for different values of i
which leapfrog each other to get the next available row? Splitting the computation of foo(DF[i,])
for a single value of i
between 7 threads? Or is it replicating the same operation of foo(DF[i,])
for the same value of i
7 separate times?
Additionally: Is it possible to accomplish something like the first scenario, where several independant threads collectively iterate over the rows (or chunks of rows) of a dataframe in a sort of parallel-serialized approach? Or is the only option subsetting ahead of time and assigning each subset to a separate thread?
registerDoParallel(7) ##4 physical cores, 8 logical cores
foreach(
i=seq(nrow(DF),
),
.packages= c("data.table","tidyverse")
) %dopar%
foo(DF[i,])