In SparkR I have a DataFrame data
contains id
, amount_spent
and amount_won
.
For example for id=1 we have
head(filter(data, data$id==1))
and output is
1 30 10
1 40 100
1 22 80
1 14 2
So far I want to know if a fixed id has more won than losses. The amount can be ignored.
In R I can make it to run but it takes time. Say we have 100 id's. In R I have done this
w=c()
for(j in 1:100){
# Making it local for a fixed id
q=collect(filter(data, data$id==j))
# Checking the difference. 1 means wins and 0 means losses
if( as.numeric(q$amount_won) - as.numeric(q$amount_spent)>0 {
w[j]=1
}
else{w[j]=0}
}
Now w simply gives me 1's and 0's for all the id's. In sparkR I want to do this a more faster way.