I have a large dataframe (10,000,000+ rows) that I would like to process. I'm also fairly new to R, and want to better understand how to work with large datasets like this.
I have a formula that I want to apply to each row in the dataframe. But I've found from experience that "for loops" and "apply" don't work all that well with really large datasets. I've been trying to wrap my head around Split-Apply-Combine, but I can't quite follow how to use it when I want to apply a function row-by-row.
Here's an example dataframe that has 1,000,000 rows. I'd like to apply a function that takes each row, and performs a simple multiplication on two columns to give an output (I realize I could do this much-easier, but I want to practice Split-Apply-Combine).
#make a dataframe
df <- data.frame("a"=c(rep("group1",times=500000),rep("group2",times=500000)),
"b"=c(1:1000000),"c"=c(1000001:2000000))
What I want to do: for each row, I want to take the value in column "b" and multiply it by the value in column "c"