3

Say I need to strsplit caabacb into individual letters except when a letter is followed by a b, thus resulting in "c" "a" "ab" "a" "cb". I tried using the following line, which looks OK on regex tester but does not work in R. What did I do wrong?

strsplit('caabacb','(?!b)',perl=TRUE)
[[1]]
[1] "c" "a" "a" "b" "a" "c" "b"
dasf
  • 1,035
  • 9
  • 16

2 Answers2

4

You could also add a prefix positive lookbehind that matches any character (?<=.). The positive lookbehind (?<=.) would split the string at every character (without removal of characters), but the negative lookahead (?!b) excludes splits where a character is followed by a b:

strsplit('caabacb', '(?<=.)(?!b)', perl = TRUE)
#> [[1]]
#> [1] "c"  "a"  "ab" "a"  "cb"
Joris C.
  • 5,721
  • 3
  • 12
  • 27
3

strsplit() probably needs something to split. You could insert e.g. a ";" with gsub().

strsplit(gsub("(?!^.|b|\\b)", ";", "caabacb", perl=TRUE), ";", perl=TRUE)
# [[1]]
# [1] "c"  "a"  "ab" "a"  "cb"
jay.sf
  • 60,139
  • 8
  • 53
  • 110