I have a large dataset (around 5 million observations). The observations record the total revenue from a specific event by different type of subevents denoted by "type". A small replication of the data is below:
Event_ID = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)
Type=c("A","B","C","D","E","A","B","C","D","E","A","B","C","D")
Revenue1=c(24,9,51,7,22,15,86,66,0,57,44,93,34,37)
Revenue2=c(16,93,96,44,67,73,12,65,81,22,39,94,41,30)
z = data.frame(Event_ID,Type,Revenue1,Revenue2)
I would like to use GPU cores to run a function that I wrote (I have never attempted GPU processing, so am at a complete loss how to begin). The actual function takes a really long time to run. I am showing a very simple version of the function below:
Total_Revenue=function(data){
full_list=list()
event_list=unique(data[,'Event_ID'])
for (event in event_list){
new_data=list()
event_data = data[which(data$Event_ID==event),]
for (i in 1:nrow(event_data)){
event_data[i,'Total_Rev'] = event_data[i,'Revenue1']+event_data[i,'Revenue2']
new_data=rbind(new_data,event_data[i,])
}
full_list=rbind(full_list,new_data)
}
return(full_list)
}
Total = Total_Revenue(data=z)
print(Total)
This simplified version function proceeds as follows:
a) Break up the dataset into subsets such that each subset only takes 1 unique event.
b)For each observation, loop through all the observations and compute Revenue1+Revenue2.
c)Store the subsets and at the end return the new dataset.
Since I have no prior experience, I was looking at some of the R packages. I found the gpuR package and installed it. However, I am having difficulty in understanding how to implement this. Also the issue is that my coding background is very weak. I have self taught myself some things over the past year.
Any help/leads will be highly appreciated. I am open to using any alternate packages as well. Please let me know if I missed anything.
P.S. I also took a snapshot of my system using the following command:
str(gpuInfo())
I am attaching the output for your reference:
P.P.S. Please note that my actual function is a little complicated and long and it takes a long time to run which is why I want to implement gpu processing here.