I recently began experimenting with R as a language to use for genetic programming. I have slowly but surely been learning more and more about how R works and its best coding practices. Yet, I have hit a road block. Here is my situation. I have a dataset with roughly 700 rows, each row has 400 or so columns. I have everything setup that a function with a number of parameters the same as the number of columns gets sent as a parameter into an evaluation (fitness scoring) function. I want to go row by row in the dataset and pass the values in each column in a row into the function being evaluated. The first problem was figuring out how to pass in the parameters separately into the function. By "separately" I mean that the function expects 400 parameters, not a vector of length 400. To do this I used the following:
do.call(function,as.list(parameters))
Where parameters is a vector of a month variable (1-12) that is appended to the values in a row in the dataset. This works fine, I just used a for loop to iterate over the 700 rows in the dataset and then another loop for the 12 months and use the above to accumulate a vector of outputs. The problem is this is painfully slow, around 24-28 seconds per function. And I have 100-500 functions sent into this evaluation every generation of evolution. The bottom line is this is not the way to go. Next I attempted to use the sapply method as below.
outputs <- sapply(1:12,function(m) sapply(rows[1:length(rows)],function(p) do.call(f,as.list(c(p,m)))))
This applied (1-12) as the months and then applied (1-700) as the rows of the dataset. This took just as long. Any ideas on solutions would be helpful.