1

I have a list of vectors, for instance:

vec1 <- c(rep(0,5), 1, rep(0,11), rep(1,4), rep(0,6))
vec2 <- c(rep(0,11), 1, rep(0,18))
vec3 <- c(rep(0,3), rep(1,5), rep(0,21))
vec4 <- c(rep(0,23))
  
test_list <- list(vec1, vec2, vec3, vec4)

I would like to filter this list based on 2 conditions:

  1. 1 is present within the vector.
  2. 1 appears consecutively (in a row) more than 3 times.

My output should contain vec1 and vec3.

I wrote a following function:

filter_ones <- test_list[sapply(test_list,function(vec) 1 %in% vec )]

And it returns vec1, vec2, and vec3.

How to apply the second condition? I probably shall use rle() but have no idea, how to do so. I will be grateful for help.

ramen
  • 691
  • 4
  • 20
  • Isn't condition (1) obsolete? Or do you mean that a single `1` has to be present in the vector BESIDES the consecutive `1`s? – sedsiv Nov 28 '21 at 20:28

2 Answers2

2

We could add a second condition using rle short-circuiting with the OP's first logical expression (1 %in% vec) in Filter to filter the elements of the list.

The rle on the logical converted binary values is converted to a second logical based on whether the lengths (from rle) is greater than threshold 'n' and it is a 1 (TRUE), wrap with any to return a single TRUE/FALSE

n <- 3
Filter(function(x) 1 %in% x && any(with(rle(as.logical(x)), 
      lengths > n & values)), test_list)

-output

[[1]]
 [1] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0

[[2]]
 [1] 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Or using the OP's sapply

test_list[sapply(test_list,function(vec) 1 %in% vec && 
      any(with(rle(as.logical(vec)), 
      lengths > n & values)))]
[[1]]
 [1] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0

[[2]]
 [1] 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This is very helpful. Why not `&` ? Because of vectorization? – TarJae Nov 28 '21 at 20:43
  • 1
    @TarJae The OPs case is just to filter the elements of the `list` i.e. for each list lement, requires a single TRUE/FALSE. The `any` is wrapped in second conditon to just return a single TRUE/FALSE By using `&&` we can shortcircuit i.e. if the first condition is not met -FALSE, it wouldn't evaluate the second one – akrun Nov 28 '21 at 20:44
  • @akrun, and what if I would have more values, such as, for instance -1? In case of my real data it is unlikely, but not impossible. If I would not like to take -1s into consideration, what shall I do? – ramen Nov 28 '21 at 21:12
  • @ramen in that case, instead of `rle(as.logical(vec)` or `rle(as.logical(x))`, it would be `rle(x == 1)` or `rle(x %in% 1)` if there are NAs as well – akrun Nov 28 '21 at 21:34
1

I condensed the vectors to a string and used grepl to get those vectors that match both conditions.

test_list[
  # Get non-empty results
  vapply(
    # Find vectors where conditions apply
    sapply(test_list, function(x) {
      # condense vector to string and do grepl
      # find those vectors where there is a 1 and a sequence of at least 3 1s and vice versa
      if(grepl("1.*111|111.*1", paste(x, collapse = ""))) x
    # Get non-empty results  
    }), Negate(is.null), NA
  )
]
sedsiv
  • 531
  • 1
  • 3
  • 15