0

My (sample) data look as follows:

mydata <- structure(list(x1 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), x2 = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 4L, 4L), x3 = c(1L, 3L, 5L, 1L, 3L, 5L, 1L, 4L, 5L, 2L, 
1L, 5L, 6L, 6L), week = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 30L, 50L, 
22L, 52L, 36L, 25L, 26L), newar1 = c(0L, 0L, 2L, 0L, 0L, 2L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), newvar2 = c(0L, 2L, 0L, 0L, 
2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("x1", "x2", 
"x3", "week", "newar1", "newvar2"), class = "data.frame", row.names = c(NA, 
-14L))



x1  x2  x3  week
0   1   1   0
0   2   3   0
0   3   5   0
0   1   1   0
0   2   3   0
0   3   5   0
1   1   1   1
1   2   4   30
1   3   5   50
1   1   2   22
1   2   1   52
1   3   5   36
1   4   6   25
1   4   6   26

I would like to create 1 new variable newvar1:

  • if x1 = 0 => I would like to count the number of times in the entire dataset where x1 equals 1 (only other rows, excluding the own observation), but only count rows with the same combination of x2 and x3 and rows where the week number is larger than 24.

  • if x1 = 1 => I would like to count the number of times in the entire dataset where x1 equals 1, but only count rows with the same combination of x2 and x3 and rows where the week number minus 25 is larger than zero ((week-25)>0)).

By "sum" I mean the number of times x1 equals 1 if the conditions hold.

By "if" I mean I only want to sum x1 when the conditions following the if hold. Basically my question is: how can I only sum certain values based on conditions?

My data should look like:

x1  x2  x3  week newvar1
0   1   1   0    0       
0   2   3   0    0       
0   3   5   0    2       
0   1   1   0    0       
0   2   3   0    0       
0   3   5   0    2       
1   1   1   1    0       
1   2   4   30   0       
1   3   5   50   1       
1   1   2   22   0       
1   2   1   52   0       
1   3   5   36   0       
1   4   6   25   0       
1   4   6   26   1       

Currently I have the following code, but this does not take into account the constraints for x2=x3 and for week. Any suggestions how to do this?

mydata[,newvar1:=sum(x1), by=list(x2,x3)]
research111
  • 347
  • 5
  • 18
  • Please use dput() to provide a reproducible example – Nick Becker Aug 17 '16 at 15:39
  • Included, thank you. – research111 Aug 17 '16 at 16:02
  • 2
    I'm having a lot of trouble understanding the criteria for `newvar1` and `newvar2`. – jdobres Aug 17 '16 at 16:24
  • Could you clarify the logic needed for the creation of `newvar1` and `newvar2`? For example, why does `newvar2 = 2` in the second row? – bouncyball Aug 17 '16 at 16:28
  • Agreed this was not clear! I simplified and clarified it, basically my question is how I can only sum certain values based on multiple conditions? – research111 Aug 17 '16 at 18:51
  • Criteria are still unclear. What do you mean by "sum(x1)"? – jdobres Aug 17 '16 at 18:58
  • I would like to get the sum of all 1's of x1, or, otherwise stated, the number of times it equals 1 – research111 Aug 17 '16 at 19:02
  • I'm really confused. The third row, `x1` is 0. `x2` is 3. `x3` is 5. `week` is 0. Your `newvar` result is `2`. I can't figure out why. Here are my guesses: You code makes it sound like for this row of `newvar` you want to sum `x1` for all rows where the `x2` and `x3` values match *on that row* (`x2 == x3`) AND the week is more than 24. Because `x2` and `x3` are always different on a single row, this will always be 0. – Gregor Thomas Aug 17 '16 at 19:19
  • My next guess is that what you *mean* is that you want to sum `x1` for all rows where `x3` is equal to *this value* of `x2`. You could code this as `x3 == x2[3]` (since we're on the 3rd row) AND week > 24. There are 2 rows where `x3` is 3, but both have low week numbers and `0` values for `x1`, so this can't be right. – Gregor Thomas Aug 17 '16 at 19:21
  • My third guess was that I had it switched - that you want to sum all rows where the `x2` value matches the current row's `x3` value, but there aren't any of those (`x3[3]` is 5 and `x2` is never 5), so that's wrong as well. So I'm stumped. Can you explain, in words how you arrive at the result of `2` for the third row? – Gregor Thomas Aug 17 '16 at 19:22
  • I adapted the post. I would like to count the number of times/rows in the entire dataset where x1 equals 1, but only count rows that have the same combination of x2 and x3 and rows and where the week number is larger than 24. – research111 Aug 17 '16 at 19:33
  • Okay, this is better. I've got code that matches all the cases when `x1` is 0. Now, what about row 9? `x1` is 1, `x2` is 3, `x3` is 5, week is `50`. We should be counting rows where `x1` is 1, `x2` and `x3` match (are 3 and 5), and `week - 25 > 0`, (which is equivalent to `week > 25`). It looks like Row 9 itself fits the conditions and also Row 12 - so the answer should be 2. Bur your answer is 0. Why? – Gregor Thomas Aug 17 '16 at 20:22
  • You are right. I adapted. Sorry for all the confusion. – research111 Aug 17 '16 at 20:26
  • You adapted to 1 - to me it still looks like it should be 2. In fact, rows 8, 9, 11-14 all match themselves. Do you want to only count *other* rows? – Gregor Thomas Aug 17 '16 at 20:46
  • Correct, only other rows – research111 Aug 17 '16 at 20:48
  • 1
    @research111 then shouldn't row 14 be 0 while row 13 is 1? why isn't row 12 equal to 1? – bouncyball Aug 17 '16 at 20:52

3 Answers3

0

I think we can use a for loop to get this done:

for(i in 1:nrow(mydata)){
 if(mydata[i,1] == 0){ # x1 == 0
   mydata[i,]$newvar1 = 
    sum(mydata[-i,1] == 1 & # count where x1 == 1
        mydata[i,2] == mydata[-i,2] & # and where (x2 == x2) & (x3 == x3)
        mydata[i,3] == mydata[-i,3] & 
        mydata[-i,4] > 24) # and week > 24
 }else{ # x1 == 1
    mydata[i,]$newvar1 = 
        sum(mydata[-i,1] == 1 & # count where x1 == 1
            mydata[i,2] == mydata[-i,2] & # and where (x2 == x2) & (x3 == x3)
            mydata[i,3] == mydata[-i,3] & 
            mydata[-i,4] > 25) # and week > 25
    }
}

# mydata
#    x1 x2 x3 week newvar1
# 1   0  1  1    0       0
# 2   0  2  3    0       0
# 3   0  3  5    0       2
# 4   0  1  1    0       0
# 5   0  2  3    0       0
# 6   0  3  5    0       2
# 7   1  1  1    1       0
# 8   1  2  4   30       0
# 9   1  3  5   50       1
# 10  1  1  2   22       0
# 11  1  2  1   52       0
# 12  1  3  5   36       1
# 13  1  4  6   25       1
# 14  1  4  6   26       0

Alternatively, if, for x1 == 1 you want to compare data on all rows:

for(i in 1:nrow(mydata)){
    if(mydata[i,1] == 0){ # x1 == 0
        mydata[i,]$newvar1 = 
            sum(mydata[-i,1] == 1 & #count where x1 = 1
                    mydata[i,2] == mydata[-i,2] & # and where (x2 == x2) & (x3 == x3)
                    mydata[i,3] == mydata[-i,3] & 
                    mydata[-i,4] > 24) # and week > 24
    }else{
        mydata[i,]$newvar1 = 
            sum(mydata[,1] == 1 &
                mydata[i,2] == mydata[,2] & 
                mydata[i,3] == mydata[,3] & 
                mydata[,4] > 25)
    }
}

# mydata
#    x1 x2 x3 week newvar1
# 1   0  1  1    0       0
# 2   0  2  3    0       0
# 3   0  3  5    0       2
# 4   0  1  1    0       0
# 5   0  2  3    0       0
# 6   0  3  5    0       2
# 7   1  1  1    1       0
# 8   1  2  4   30       1
# 9   1  3  5   50       2
# 10  1  1  2   22       0
# 11  1  2  1   52       1
# 12  1  3  5   36       2
# 13  1  4  6   25       1
# 14  1  4  6   26       1
bouncyball
  • 10,631
  • 19
  • 31
0
mydata$newvar1 <- ifelse(mydata$x1==0, sapply(seq_len(nrow(mydata)), function(i) with (mydata, sum(x1[week > 25 & x2==x2[i] & x3==x3[i]]))), 0)
mydata$newvar1 <- ifelse(mydata$x1==1, sapply(seq_len(nrow(mydata)), function(i) with (mydata, sum(x1[week < week[i] & week[i]!=0 & week-week[i]<25 & x2==x2[i] & x3==x3[i]]))), mydata$newvar1)
research111
  • 347
  • 5
  • 18
0

Using dplyr:

library(dplyr)
mydata %>% group_by(x2, x3) %>%
    mutate(newvar1 = ifelse(x1 == 0,
                            sum(x1 * week > 24),
                            sum(x1 * week > 25) - (week > 25) * (x1 == 1)))
# Source: local data frame [14 x 6]
# Groups: x2, x3 [7]
# 
#       x1    x2    x3  week newvar2 newvar1
#    <int> <int> <int> <int>   <int>   <int>
# 1      0     1     1     0       0       0
# 2      0     2     3     0       2       0
# 3      0     3     5     0       0       2
# 4      0     1     1     0       0       0
# 5      0     2     3     0       2       0
# 6      0     3     5     0       0       2
# 7      1     1     1     1       0       0
# 8      1     2     4    30       0       0
# 9      1     3     5    50       1       1
# 10     1     1     2    22       0       0
# 11     1     2     1    52       0       0
# 12     1     3     5    36       0       1
# 13     1     4     6    25       0       1
# 14     1     4     6    26       0       0

The weird bit in the else condition, - (week > 25) * (x1 == 1), is to subtract 1 from rows that would otherwise match themselves.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294