Filter dataframe for longest sequence of repeated numbers by row in r

Question

I am trying to create a "filter-by" matrix, which I can use to isolate rows of data in my data frame, such that each row contained only the values that correspond to the longest consecutive sequence of the same number, while the rest are kept zero. After searching around, I think rle is the function to use, but that does not give me what I am after. Here is an example of my code and results. Suggestions and solutions would be very much appreciated. Thank you!

SAMPLE DATA:

    a<- c(1,0,1,1,1,1,0,0)
    b<- c(0,0,0,1,1,1,0,1)
    c<- c(0,0,1,1,0,0,0,1)
    d<- c(1,0,0,1,1,1,1,0)
    e<- c(1,0,0,1,0,0,1,1)
    f<- c(0,0,0,1,1,1,0,1)
    g<- c(0,0,1,1,0,0,0,1)
    test.data <- data.frame(cbind(a,b,c,d,e,f,g))

    # > test.data
    #   a b c d e f g
    # 1 1 0 0 1 1 0 0
    # 2 0 0 0 0 0 0 0
    # 3 1 0 1 0 0 0 1
    # 4 1 1 1 1 1 1 1
    # 5 1 1 0 1 0 1 0
    # 6 1 1 0 1 0 1 0
    # 7 0 0 0 1 1 0 0
    # 8 0 1 1 0 1 1 1

SAMPLE CODE FOR ATTEMPTED SOLUTION:

result <- data.frame(lapply(test.data, function(x) {
  r <- rle(x)
  r$values[r$lengths!=max(r$lengths)]==1
  r2=inverse.rle(r)
  r2
}))

RESULT I GET (looks like exact copy of what went in?):

# > result
#    a b c d e f g
# 1  1 0 0 1 1 0 0
# 2  0 0 0 0 0 0 0
# 3  1 0 1 0 0 0 1
# 4  1 1 1 1 1 1 1
# 5  1 1 0 1 0 1 0
# 6  1 1 0 1 0 1 0
# 7  0 0 0 1 1 0 0
# 8  0 1 1 0 1 1 1

THIS IS THE RESULT I WANT TO GET (T/F can be used instead of 1 and 0, if easier):

# > result
#    a b c d e f g
# 1  0 0 0 1 1 0 0
# 2  0 0 0 0 0 0 0
# 3  0 0 0 0 0 0 0
# 4  1 1 1 1 1 1 1
# 5  1 1 0 0 0 0 0
# 6  1 1 0 0 0 0 0
# 7  0 0 0 1 1 0 0
# 8  0 0 0 0 1 1 1

PLEASE ADVISE!

Andrew Gustar · Answer 1 · 2018-06-08T21:35:05.043

0

I think is what you are after...

test.data[] <- t(apply(test.data,1,function(x) {y<-rle(x)
           y$values[y$lengths==1] <- 0
           y$values[y$lengths!=max(y$lengths[y$values==1])] <- 0
           return(inverse.rle(y))}))

test.data
  a b c d e f g
1 0 0 0 1 1 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
4 1 1 1 1 1 1 1
5 1 1 0 0 0 0 0
6 1 1 0 0 0 0 0
7 0 0 0 1 1 0 0
8 0 0 0 0 1 1 1

edited Jun 08 '18 at 21:35

answered Jun 08 '18 at 19:28

Andrew Gustar

17,295
1
22
32

Thanks Andrew! That is closer to what I am after. Is there a way to keep the row 7 1-1, since I would like to keep any sequence of at least 2 or more consecutive 1, but if there are several sequences in same row, then the one that is longest or if there are 2 of the same length, then the first occurring one. Please let me know. Thank you! – Mary Jun 08 '18 at 19:35
@Mary I've amended the answer to fit your logic. It throws a couple of warnings which can be ignored - the answer is right! – Andrew Gustar Jun 08 '18 at 21:35
@ Andrew - Thank you! – Mary Jun 09 '18 at 02:39

IceCreamToucan · Accepted Answer · 2018-06-08T20:27:43.740

0

library(magrittr)

val <- 1

test.data %>% 
    apply(1, function(x){
      rle(x) %$% { 
        if(all(values != val)) rep(0, length(x))
        else {
          m      <- max(lengths[values == val]) 
          # Get only longest sequences
          values <- (lengths == m & values == val)*values*(m > 1)
          # Get only one of them
          values[seq_along(values) != which(values == val)[1]] <- 0
          rep(values, lengths)
        }
    }}) %>% t

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,]    0    0    0    1    1    0    0
# [2,]    0    0    0    0    0    0    0
# [3,]    0    0    0    0    0    0    0
# [4,]    1    1    1    1    1    1    1
# [5,]    1    1    0    0    0    0    0
# [6,]    1    1    0    0    0    0    0
# [7,]    0    0    0    1    1    0    0
# [8,]    0    0    0    0    1    1    1

edited Jun 08 '18 at 20:27

answered Jun 08 '18 at 19:35

IceCreamToucan

28,083
2
22
38

Thanks Ryan, this works, except if I have two of the same sequences in the same row. – Mary Jun 08 '18 at 19:58
a<- c(1,0,1,1,1,1,1,0) b<- c(0,0,0,1,1,1,1,1) c<- c(0,0,1,1,0,0,0,1) d<- c(1,0,0,1,1,1,1,0) e<- c(1,0,0,1,0,0,1,1) f<- c(0,0,0,1,1,1,0,1) g<- c(0,0,1,1,0,0,0,1) test.data <- data.frame(cbind(a,b,c,d,e,f,g)) – Mary Jun 08 '18 at 19:58
in row 7 there are two occurrences of 1,1 - so how can I keep only the first? – Mary Jun 08 '18 at 19:59

Filter dataframe for longest sequence of repeated numbers by row in r

2 Answers2