Replacing for-loops with apply to improve perfomance (with weighted.mean)

Question

I am a R newbie so hopefully this is a solvable problem for some of you. I have a dataframe containing more than a million data-points. My goal is to compute a weighted mean with an altering starting point.

To illustrate consider this frame ( data.frame(matrix(c(1,2,3,2,2,1),3,2)) )

where X1 is the data and X2 is the sampling weight.

I want to compute the weighted mean for X1 from starting point 1 to 3, from 2:3 and from 3:3.

With a loop I simply wrote:

B <- rep(NA,3) #empty result vector
for(i in 1:3){
  B[i] <- weighted.mean(x=A$X1[i:3],w=A$X2[i:3]) #shifting the starting point of the data and weights further to the end
}

With my real data this is impossible to compute because for each iteration the data.frame is altered and the computing takes hours with no result.

Is there a way to implement a varrying starting point with an apply command, so that the perfomance increases?

regards, Ruben

I don't understand why your data frame has to be altered. If your real data is different in some important way from your example, how are we supposed to construct a solution that works on your real data? — joran, Mar 07 '12 at 20:28
Sorry, that probably came out wrong. The data frame is not altered but because of the altering start point, in each iteration the weighted mean is computed for a new subsection of the orginal data frame. — Ruben, Mar 07 '12 at 21:22

Tommy · Accepted Answer · 2012-03-07T21:59:14.333

3

Building upon @joran's answer to produce the correct result:

with(A, rev(cumsum(rev(X1*X2)) / cumsum(rev(X2))))
# [1] 1.800000 2.333333 3.000000

Also note that this is much faster than the sapply/lapply approach.

edited Mar 07 '12 at 21:59

answered Mar 07 '12 at 21:53

Tommy

39,997
12
90
85

wow, thanks. I was in the middle of writing something about "reverse cumsum" but that's exactly it. – Ruben Mar 07 '12 at 22:03

score 1 · Answer 2 · answered Mar 07 '12 at 20:41

1

You can use lapply to create your subsets, and sapply to loop over these, but I'd wager there would be a quicker way.

sapply(lapply(1:3,":",3),function(x) with(dat[x,],weighted.mean(X1,X2)))
[1] 1.800000 2.333333 3.000000

answered Mar 07 '12 at 20:41

James

65,548
14
155
193

Thanks a lot for the answer! I knew there had to be some sort of apply variation that would work.I am trying to wrap my head around it and will implement it. It sure seems to work. – Ruben Mar 07 '12 at 21:47

Replacing for-loops with apply to improve perfomance (with weighted.mean)

2 Answers2