1

I have a list of sales of different articles during time in the following format:

col <- c("A", "B", "C")
A <- c(1,0,0)
B <- c(0,1,0)
C <- c(0,0,1)
colnames(df) <- c('article','w1', 'w2', 'w3')
df
 article w1 w2 w3
 A       1  0  0
 B       0  1  0
 C       0  0  1

What I need is to create a new column that will calculate the mean of the rows, but starting only after first positive occurence in the row. That means that if a row looks like:

A 0 1 0

The algorithm has to take into account only last two values (1 and 0) and to place the value (1+0)/2 = 0.5 into the new column. The final result has to look like this:

 article w1 w2 w3 Mean
 A       1  0  0  0.33
 B       0  1  0   0.5
 C       0  0  1     1

Can, please, anyone tell me how to get it right?

Thanks a lot

Pavel M
  • 67
  • 5
  • 1
    Is it always binary data set? Also, it is better to have it in a matrix format if these are only numbers. – David Arenburg Mar 31 '16 at 16:46
  • 1
    If, indeed, you only have 0/1, you could use the row-sums and the index of first 1 in each row; `m = as.matrix(df[-1L]); rowSums(m) / (ncol(m) - max.col(m, "first") + 1L)` – alexis_laz Mar 31 '16 at 17:56

2 Answers2

3

which(x > 0) will return the index of the first element where x > 0.

df$Mean = apply(df[-1], 1, function(x) mean(x[min(which(x > 0)):length(x)]))

> df
  article w1 w2 w3      Mean
1       A  1  0  0 0.3333333
2       B  0  1  0 0.5000000
3       C  0  0  1 1.0000000
Señor O
  • 17,049
  • 2
  • 45
  • 47
  • 4
    I think it should be noted that `which` returns all of the indexes where x is greater than zero. So perhaps you should add a `[1]` to the end of of the `which` call to make `x[which(x > 0)[1]:length(x)]` In the example provided, it doesn't cause an issue, but if there is more than one element greater than zero, the subset is needed to get only the first item. – giraffehere Mar 31 '16 at 16:48
  • @giraffehere Good point, for some reason I thought it returned just the first one. – Señor O Mar 31 '16 at 17:01
  • 3
    @giraffehere actually also ends up working no matter what because only the first element of a set would get used for `(set):length(x)`. But thats bad practice – Señor O Mar 31 '16 at 17:09
  • 1
    You're right, I believe R would throw a warning (but still run), however. – giraffehere Apr 04 '16 at 14:24
1

Here is another option

library(matrixStats)
df$Mean <- rowMeans((NA^(!rowCumsums(as.matrix(df[-1]))))*df[-1], 
                                  na.rm=TRUE)
df$Mean
#[1] 0.3333333 0.5000000 1.0000000
akrun
  • 874,273
  • 37
  • 540
  • 662