Calculate means of a row with a condition in column selection in R

Question

I have a list of sales of different articles during time in the following format:

col <- c("A", "B", "C")
A <- c(1,0,0)
B <- c(0,1,0)
C <- c(0,0,1)
colnames(df) <- c('article','w1', 'w2', 'w3')
df
 article w1 w2 w3
 A       1  0  0
 B       0  1  0
 C       0  0  1

What I need is to create a new column that will calculate the mean of the rows, but starting only after first positive occurence in the row. That means that if a row looks like:

A 0 1 0

The algorithm has to take into account only last two values (1 and 0) and to place the value (1+0)/2 = 0.5 into the new column. The final result has to look like this:

 article w1 w2 w3 Mean
 A       1  0  0  0.33
 B       0  1  0   0.5
 C       0  0  1     1

Can, please, anyone tell me how to get it right?

Thanks a lot

Is it always binary data set? Also, it is better to have it in a matrix format if these are only numbers. — David Arenburg, Mar 31 '16 at 16:46
If, indeed, you only have 0/1, you could use the row-sums and the index of first 1 in each row; `m = as.matrix(df[-1L]); rowSums(m) / (ncol(m) - max.col(m, "first") + 1L)` — alexis_laz, Mar 31 '16 at 17:56

Señor O · Answer 1 · 2016-03-31T17:01:21.223

3

which(x > 0) will return the index of the first element where x > 0.

df$Mean = apply(df[-1], 1, function(x) mean(x[min(which(x > 0)):length(x)]))

> df
  article w1 w2 w3      Mean
1       A  1  0  0 0.3333333
2       B  0  1  0 0.5000000
3       C  0  0  1 1.0000000

edited Mar 31 '16 at 17:01

answered Mar 31 '16 at 16:45

Señor O

17,049
2
45
47

4

I think it should be noted that `which` returns all of the indexes where x is greater than zero. So perhaps you should add a `[1]` to the end of of the `which` call to make `x[which(x > 0)[1]:length(x)]` In the example provided, it doesn't cause an issue, but if there is more than one element greater than zero, the subset is needed to get only the first item. – giraffehere Mar 31 '16 at 16:48
@giraffehere Good point, for some reason I thought it returned just the first one. – Señor O Mar 31 '16 at 17:01
3

@giraffehere actually also ends up working no matter what because only the first element of a set would get used for `(set):length(x)`. But thats bad practice – Señor O Mar 31 '16 at 17:09
1

You're right, I believe R would throw a warning (but still run), however. – giraffehere Apr 04 '16 at 14:24

score 1 · Accepted Answer · answered Mar 31 '16 at 17:49

1

Here is another option

library(matrixStats)
df$Mean <- rowMeans((NA^(!rowCumsums(as.matrix(df[-1]))))*df[-1], 
                                  na.rm=TRUE)
df$Mean
#[1] 0.3333333 0.5000000 1.0000000

answered Mar 31 '16 at 17:49

akrun

874,273
37
540
662

Calculate means of a row with a condition in column selection in R

2 Answers2