0

I have a DataFrame in sparkR called 'data'. 'Data' contains 'user', 'amount_spent' and 'amount_won'. I want to calculate balance= amount_spent - amount_won for user 1.

y <- filter(data, data$user==1)

Now I calculate the sums

yn <- agg(groupBy(y, "user"), amount_spent="sum", amount_won="sum")

Now I calculate the balance for user 1

ynn <- withColumn(yn, "balance", yn[[3]] - yn[[2]])

And this all gives me a correct results however I want to attach "balance" and have it as an integer from 'ynn' which is a DataFrame. How can I do that? And if I want to do this for 100 users I need to do the same thing 100 times I assume.

csgillespie
  • 59,189
  • 14
  • 150
  • 185
Ole Petersen
  • 670
  • 9
  • 21

1 Answers1

1

I may be missing something, but why not do:

## The data set is now `data` not yn
yn = agg(groupBy(data, "user"), amount_spent="sum", amount_won="sum")

When you now calculate the balance, you have it per user

ynn = withColumn(yn, "balance", yn[[3]] - yn[[2]])
csgillespie
  • 59,189
  • 14
  • 150
  • 185