How to subtract elements in a DataFrame

Question

In SparkR I have a DataFrame data contains id, amount_spent and amount_won.

For example for id=1 we have

head(filter(data, data$id==1))

and output is

So far I want to know if a fixed id has more won than losses. The amount can be ignored.

In R I can make it to run but it takes time. Say we have 100 id's. In R I have done this

w=c()
for(j in 1:100){
# Making it local for a fixed id 
q=collect(filter(data, data$id==j))
# Checking the difference. 1 means wins and 0 means losses
if( as.numeric(q$amount_won) - as.numeric(q$amount_spent)>0 {
w[j]=1 
}
else{w[j]=0}
}

Now w simply gives me 1's and 0's for all the id's. In sparkR I want to do this a more faster way.

Wannes Rosiers · Accepted Answer · 2015-09-08T14:46:15.933

1

I am not sure wether this is exactly what you want, so feel free to ask for adjustments.

df <- data.frame(id = c(1,1,1,1),
                 amount_spent = c(30,40,22,14),
                 amount_won = c(10,100,80,2))

DF <- createDataFrame(sqlContext, df)
DF <- withColumn(DF, "won", DF$amount_won > DF$amount_spent)
DF$won <- cast(DF$won, "integer")

grouped <- groupBy(DF, DF$id)
aggregated <- agg(grouped, total_won = sum(DF$won), total_games = n(DF$won))

result <- withColumn(aggregated, "percentage_won" , aggregated$total_won/aggregated$total_games)

collect(result)

I have added a column to DF whether the ID has won more than he spent on that row. The result has as output the amount of games someone played, the amount of games he won and the percentage of games he won.

edited Sep 08 '15 at 14:46

answered Sep 08 '15 at 14:21

Wannes Rosiers

1,680
1
12
18

Yes that makes sense. How should one count the total number of won=FALSE and TRUE as well ? – Ole Petersen Sep 08 '15 at 14:35
Counted the total number of games, total of won games and percentage of won games. – Wannes Rosiers Sep 08 '15 at 14:46

How to subtract elements in a DataFrame

1 Answers1