2
      A<-  c('C-C-C','C-C', 'C-C-C-C')

      library(stringr)
      B<- str_count(A, "C-C")
      df<- data.frame(A,B)

     A        B (expected)   B(actual) 
   C-C-C      2              1
   C-C        1              1
   C-C-C-C    3              2

I am trying to count all the transitions, however, I am getting the wrong answer. Can someone suggest how to fix this?

user3570187
  • 1,743
  • 3
  • 17
  • 34
  • You expect that the strings are allowed to *overlap*, what is not the case. – GKi Mar 16 '21 at 12:23
  • If overlap is allowed, is it not simpler to count the `-`? Alternatively, `strsplit` based on your expectation and count the valid parts. E.g. `strsplit(x = "C-C-C-C-C-C", split = "C-C", fixed = TRUE)` returns three parts: `[1] "" "-" "-"` – mhovd Mar 16 '21 at 12:27

3 Answers3

1

You expect that the strings are allowed to overlap, what is not the case. For that you need to make a Lookahead.

str_count(A, "C(?=-C)")
#[1] 2 1 3

or count the -:

str_count(A, "-")
#[1] 2 1 3

or in base:

lengths(gregexpr("C(?=-C)", A, perl=TRUE))
#[1] 2 1 3
GKi
  • 37,245
  • 2
  • 26
  • 48
0

str_count wraps stringi::stri_count. While it does not allow you to specify optional arguments, you could just call stri_count directly.

stringi::stri_count(str = A, pattern = "C-C", fixed = stringi::stri_opts_fixed(overlap = TRUE))
erocoar
  • 5,723
  • 3
  • 23
  • 45
0

Using gsub with nchar in base R

nchar(gsub("[^-]+", "", A))
#[1] 2 1 3
akrun
  • 874,273
  • 37
  • 540
  • 662