1

I would like to create a new variable, Number, which sequentially generate numbers within a group ID, starting at a particular condition (in this case, when Percent > 5).

groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Percent <- c( 3, 4, 5, 10, 2, 1, 6, 8, 4, 8, 10, 11)

Number <- ifelse (Percent < 5, 0, 1:4)

I get:

> Number
[1] 0 0 3 4 0 0 3 4 0 2 3 4

But I'd like:

    0 0 1 2 0 0 1 2 0 1 2 3

I did not include groupID variable within the ifelse statement and used 1:4 instead, as there are always 4 rows within each groupID.

Any suggestions or clues? Thank you!

thelatemail
  • 91,185
  • 12
  • 128
  • 188
user3698046
  • 119
  • 2
  • 10
  • You're saying that if an element of `Percent` is `>= 5`, then you want the corresponding element of `Number` to be `1:4`. i.e., you're trying to insert a vector with 4 elements into a single element of `Number`. I'm pretty sure what you are after is: `y <- rep(1:4, 3); y[Percent < 5] <- 0`. (Where does `ID` come into the equation? You don't refer to it at all...) – jbaums Jun 18 '14 at 00:13

3 Answers3

4
 ave(Percent, groupID, FUN=function(x) cumsum(x>=5))
[1] 0 0 1 2 0 0 1 2 0 1 2 3

To the example in the comments below, this is my alternate logical test to be cumsum()-ed:

ave(Percent, groupID, FUN=function(x) cumsum(seq_along(x)>= which(x >=5)[1]) )
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you!!! Before making this stackoverflow question post, I was staring at this case forever: http://stackoverflow.com/questions/14294948/sequentially-numbering-many-rows-blocks-of-unequal-length But I wasn't aware of cumsum() ! – user3698046 Jun 18 '14 at 01:23
  • This works nicely, but after looking up cumsum, I found out that this would only work for group of 4's or less. For example, if I have group of 5's: groupID <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3) ; > Percent <- c( 3, 4, 5, 10, 11, 3, 2, 1, 6, 8, 4, 8, 10, 4, 11) ; > ave(Percent, groupID, FUN=function(x) cumsum(x>=5)) ; The result: [1] 0 0 1 2 3 0 0 0 1 2 0 1 2 2 3 – user3698046 Jun 18 '14 at 16:20
  • This has nothing to do with limitations on `cumsum` but rather with the problem statement. Your original example suggested the number would be increasing within the individual groupID values. If that's not the case and you do want to have a sequential increase even if a later item in the sequence is below 5 then we just need to change the test that is summed. Edit your question to provide a better example and problem definition. – IRTFM Jun 18 '14 at 16:45
  • You're right; I didn't define the problem well. Thanks for your help. – user3698046 Jun 18 '14 at 16:50
2

It's ugly and throws warnings, but it gets you what you want:

ave(Percent,groupID,FUN=function(x) {x[x<5] <- 0; x[x>=5] <- 1:4; x} )
#[1] 0 0 1 2 0 0 1 2 0 1 2 3

@BondedDust's answer below using cumsum is almost certainly more appropriate though.

If your data was not always in ascending order in each group, you could also replace all the >=5 values like:

Percent <- c( 3, 5, 4, 10, 2, 1, 6, 8, 4, 8, 10, 11)
ave(Percent, list(groupID,Percent>=5), FUN=function(x) cumsum(x>=5))
#[1] 0 1 0 2 0 0 1 2 0 1 2 3
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Thank you, thelatemail! It works! The meaning of the square brackets within the curly brackets is unclear, but I'll look into it! – user3698046 Jun 18 '14 at 00:33
  • @user3698046: curly braces allow multi-line (or in this case, semicolon-separated, which is equivalent) expressions. Here, they allow the definition of the function to span multiple lines. The square brackets within them are subsetting `x` as usual. – jbaums Jun 18 '14 at 00:37
1

Try this:

ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Percent <- c( 3, 4, 5, 10, 2, 1, 6, 8, 4, 8, 10, 11)


Number <- Percent >= 5

result = lapply(seq_along(Number), function(i){
    if( length(which(! Number[1:i]) ) == 0){start = 1}
    else {start =max(which(! Number[1:i]) )}

    sum( Number[start : i])

  })

> unlist(result)
[1] 0 0 1 2 0 0 1 2 0 1 2 3
Alex
  • 15,186
  • 15
  • 73
  • 127
  • This is a nice if-else code, which works well for strictly consecutive sequence! However, we can't the result I was looking for when the sequence skips, like this (notice how you can't get "1 0 2" : 0 0 1 2 0 0 1 2 0 1 0 2. Where the last "2" would be 1 instead. This is because the if-else code is not dependent on ID at all. Thank you anyway!!! I'm learning a lot~! – user3698046 Jun 18 '14 at 16:52