I want to split the string whenever it encounters "a", provided "a" should not be followed by "b"
string <- "abcgualoo87ahhabta"
I should get output as
string <- [1]abcgua
[2]loo87a
[3]hhabta
I want to split the string whenever it encounters "a", provided "a" should not be followed by "b"
string <- "abcgualoo87ahhabta"
I should get output as
string <- [1]abcgua
[2]loo87a
[3]hhabta
You can split your string with the pattern "a not followed by b" with the regex a(?=[^b])
in strsplit
:
split_str <- strsplit("abcgualoo87ahhabta", "a(?=[^b])", perl=TRUE)[[1]]
split_str
#[1] "abcgu" "loo87" "hhabta"
explanation of the split pattern: a lookahead ((?=)
) is used with, as "look-ahead" pattern, anything except a b ([^b]
) (the ^ sign indicates the negation). In order for the lookahead to work (be interpreted), we need to set parameter perl
to TURE
Then you can add the removed "a" at the end of the splitted part, except last:
split_str <- paste0(c(rep("a", length(split_str)-1), ""))
#[1] "abcgua" "loo87a" "hhabta"
A nice one-step alternative provided by @nicola in the comments:
split_str <- strsplit("abcgualoo87ahhabta","(?<=a)(?!b)", perl=TRUE)[[1]]
#[1] "abcgua" "loo87a" "hhabta"
string <- "abcgualoo87ahhabta"
unlist(strsplit(gsub("a([^b])", "a \\1", string), split=" "))
# [1] "abcgua" "loo87a" "hhabta"