I have a data frame which looks like this
a b c d
1 1 1 0
1 1 1 200
1 1 1 300
1 1 2 0
1 1 2 600
1 2 3 0
1 2 3 100
1 2 3 200
1 3 1 0
I have a data frame which looks like this
a b c d
1 1 1 250
1 1 2 600
1 2 3 150
1 3 1 0
I am currently doing it {
n=nrow(subset(Wallmart, a==i & b==j & c==k ))
sum=subset(Wallmart, a==i & b==j & c==k )
#sum
sum1=append(sum1,sum(sum$d)/(n-1))
}
I would like to add the 'd' coloumn and take the average by counting the number of rows without counting 0. For example the first row is (200+300)/2 = 250. Currently I am building a list that stores the 'd' coloumn but ideally I want it in the format above. For example first row would look like
a b c d
1 1 1 250
This is a very inefficient way to do this work. The code takes a long time to run in a loop. so any help is appreciated that makes it run faster. The original data frame has about a million rows.