finding max number of consecutive 1 in a string?

Question

Is there any easy way to get maximum number of consecutive 1's in a string like: "000010011100011111001111111100" ?

I, definitely, can do it with loops but I'd like to avoid that since my actual dataset has about 500,000 records.

Thanks for your help in advance.

What have you tried (and other questions from the [Stack Overflow question checklist](http://meta.stackexchange.com/questions/156810/stack-overflow-question-checklist))? — Joshua Ulrich, Aug 01 '13 at 21:01
I only tried using loops. I have two loops one as a counter on row number that starts from the first row of the dataset and goes all the way to the end. Another loop as a counter of number of consecutive 1's. But it's very inefficient and takes a long time to run. — Sam, Aug 01 '13 at 21:04
@Thomas, you are right. I searched but I didn't find anything. I should've used better keywords to search. — Sam, Aug 01 '13 at 21:13

score 7 · Answer 1 · edited May 23 '17 at 10:25

Using rle is slower and a bit more clumsy than using regular expressions. In Thomas' answer, you're still left to extract the max length when the values equal 1.

# make some data
set.seed(21)
N <- 1e5
s <- sample(c("0","1"), N*30, TRUE)
s <- split(s, rep(1:N, each=30))
s <- sapply(s, paste, collapse="")
# Thomas' (complete) answer
r <- function(S) {
  sapply(S, function(x) {
    rl <- rle(as.numeric(strsplit(x,"")[[1]]))
    max(rl$lengths[rl$values==1])
  })
}
# using regular expressions
g <- function(S) sapply(gregexpr("1*",S),
   function(x) max(attr(x,'match.length')))
# timing
system.time(R <- r(s))
#    user  system elapsed 
#    6.41    0.00    6.41
system.time(G <- g(s))
#    user  system elapsed 
#    1.47    0.00    1.46
all.equal(R,G)
# [1] "names for target but not for current"

Arun · Answer 2 · 2013-08-01T23:53:33.077

6

An alternative much faster way without using rle would be to split with consecutive 0's as follows:

# following thelatemail's comment, changed '0+' to '[^1]+'
strsplit(x, "[^1]+", perl=TRUE)

Then you can loop over and get maximum characters for each element of your list. This'll be faster than rle solution as well. and is also faster than the gregexpr solution from @Joshua. Some benchmarking...

zz <- function(x) {
    vapply(strsplit(x, "[^1]+", perl=TRUE), function(x) max(nchar(x)), 0L)
}

I just realised that @Joshua's function could also be tweaked by adding perl=TRUE and using vapply. So, I'll compare that as well.

g2 <- function(S) vapply(gregexpr("1*",S, perl=TRUE),
   function(x) max(attr(x,'match.length')), 0L)

require(microbenchmark)
microbenchmark(t1 <- zz(unname(s)), t2 <- g(unname(s)), t3 <- g2(unname(s)), times=50)
Unit: seconds
                expr      min       lq   median       uq      max neval
 t1 <- zz(unname(s)) 1.187197 1.285065 1.344371 1.497564 1.565481    50
  t2 <- g(unname(s)) 2.154038 2.307953 2.357789 2.417259 2.596787    50
 t3 <- g2(unname(s)) 1.562661 1.854143 1.914597 1.954795 2.203543    50

identical(t1, t2) # [1] TRUE
identical(t1, t3) # [1] TRUE

edited Aug 01 '13 at 23:53

answered Aug 01 '13 at 23:01

Arun

116,683
26
284
387

Nice. To generalise in the case where there are characters other than `0` or `1` would be to replace `"0+"` with `"[^1]"` in the `strsplit` call. *Marginally* slower, but probably safer. – thelatemail Aug 01 '13 at 23:06
Yes indeed, you're right. But I don't think it'll affect the performance. – Arun Aug 01 '13 at 23:08
about 50% slower in my testing. From 0.5s to 0.75s. – thelatemail Aug 01 '13 at 23:10
Just did the benchmark again. It takes 1.1 seconds with "0+" and 1.22 with "[^1]+". – Arun Aug 01 '13 at 23:34
How to get the max of value in the list using loop? Please advise, thanks so much @Arun – user3570187 Apr 16 '20 at 11:35

score 4 · Accepted Answer · edited Aug 01 '13 at 23:01

4

Use rle:

x <- "000010011100011111001111111100"
rr <- rle(strsplit(x,"")[[1]])

Run Length Encoding
  lengths: int [1:9] 4 1 2 3 3 5 2 8 2
  values : chr [1:9] "0" "1" "0" "1" "0" "1" "0" "1" "0"

Note: I removed the as.numeric part as it's not necessary. From here, you can get the maximum count of consecutive 1's with:

max(rr$lengths[which(rr$values == "1")])
# [1] 8

edited Aug 01 '13 at 23:01

Arun

116,683
26
284
387

answered Aug 01 '13 at 21:03

Thomas

43,637
12
109
140

@Arun - I think that should be a separate answer rather than an edit. If you do so, I can probably delete mine then. – thelatemail Aug 01 '13 at 23:00
@thelatemail, Yes, I realise that now. posted separately. Thanks. (Thomas, sorry for the mess). – Arun Aug 01 '13 at 23:02
How can I do this if I want to create a separate column? I tried doing it for a column and i am getting the same value for all rows. Any suggestions? – user3570187 Apr 16 '20 at 11:09

finding max number of consecutive 1 in a string?

3 Answers3