2

I am currently able to rapidly calculate the mean of a dataset I have that is several million entries using the following code :

PosAvg = mean( curTweets$posScore[curTweets$posScore > 1])
uniqPosTweets = curTweets[ curTweets$posScore > abs(curTweets$negScore) ,]
UniqPosAvg = mean( uniqPosTweets$posScore )

However, I want to weight these, and still keep the efficiency I have be doing this in the same style as above.

curTweets$posScore / curTweets$negScore can take a value of 1, 2, 3, 4, 5.

Let's say I want to give the following weights : 6,7,8,9,10 respectively. I'm using these numbers to just differentiate the from the potential values of posScore. Actual weights are calculated in my algorithm.

Is there a way to do this? I can't figure out how I would weight while maintaining this efficiency. Am I stuck having to loop through each entry and calculate contributions individually?

Thank you!

Jibril
  • 967
  • 2
  • 11
  • 29

1 Answers1

0
foo <- seq(5)
weights <- c(1, 1, 1, 1, 100)
vectorized_weighted_mean <- sum(foo * weights) / sum(weights)
drammock
  • 2,373
  • 29
  • 40