2

I have some string

string <- "abbccc"

I want to replace the chains of the same letter to just one letter and number of occurance of this letter. So I want to have something like this: "ab2c3"

I use stringi package to do this, but it doesn't work exactly like I want. Let's say I already have vector with parts for replacement:

vector <- c("b2", "c3")
stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector)

The output:

[1] "ab2b2" "ac3c3"

The output I want: [1] "ab2c3"

I also tried this way

stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector, vectorize_all=FALSE)

but i get error

Error in stri_replace_all_regex(string, "([a-z])\\1{1,8}", vector, vectorize_all = FALSE) : 
  vector length not consistent with other arguments
bartektartanus
  • 15,284
  • 6
  • 74
  • 102
jjankowiak
  • 3,010
  • 6
  • 28
  • 45
  • What is the expected output for `string <- 'bbaccc'` Is it `'b2ac3'` – akrun Nov 29 '14 at 17:31
  • It's `"b2ac3"`. Other examples `"good" --> "go2d"`, `"uffff" --> "uf4"`. I know how to create a vector with "new" parts instead of old ones but I don't know how to replace it properly. – jjankowiak Nov 29 '14 at 17:36
  • Sorry, you're ok. The type of quotes doesn't matter of course. – jjankowiak Nov 29 '14 at 17:39

2 Answers2

5

Not regex but astrsplit and rle with some paste magic:

string <- c("abbccc", "bbaccc", "uffff", "aaabccccddd")

sapply(lapply(strsplit(string, ""), rle), function(x) {
    paste(x[[2]], ifelse(x[[1]] == 1, "", x[[1]]), sep="", collapse="")
})

## [1] "ab2c3"   "b2ac3"   "uf4"     "a3bc4d3"
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
3

Not a stringi solution and not a regex either, but you can do it by splitting the string and using rle:

    string <- "abbccc"
    res<-paste(collapse="",do.call(paste0,rle(strsplit(string,"",fixed=TRUE)[[1]])[2:1]))
    gsub("1","",res)
    #[1] "ab2c3"
nicola
  • 24,005
  • 3
  • 35
  • 56