I have made a data loop below which gives me the results I need. However, the processing time is very long. I need to analyze a big amount of data (400,000+ objects, optimally 25,000,000+), and hence I am interested if there is any way I can speed up the below calculations (a snip of the data):
My dataframe is called: crsp.comp3
Permno Observation C.xsgaq C.xsgaq.depr
10026 1 45.145 44.393
10026 2 45.145 43.653
10026 3 45.145 42.925
10026 4 96.730 92.935
10026 5 96.730 91.386
10026 6 96.730 89.863
10026 7 145.511 136.333
10026 8 145.511 134.061
10026 9 145.511 131.827
10026 10 190.986 174.347
Currently, I calculate the numbers in the 'C.xsgaq.depr' column as:
for (i in 1:nrow(crsp.comp3)) {
if (crsp.comp3[i, 2] == 1) {
crsp.comp3[i, 4] <- crsp.comp3[i, 3]*(1 - (0.2/12))
} else {
crsp.comp3[i, 4] <- (crsp.comp3[i - 1, 4] +
(crsp.comp3[i, 3] - crsp.comp3[i - 1, 3]))*(1 - (0.2/12))
}
}
The observations assigned '1' need to be calculated as above, and all observations =/ 1 need to be calculated as stated in the above loop. My objective is to optimize the code so it can get processed faster. I have heard something about vectorizing the dataframe?
Thank you