0

does someone know a way to split string using at a certain string but just if there is no space before? Maybe even using the strsplit function? Here is an example:

input_str = "For example. Production of something a Product.ProIts cool"

I want to split the string using the "Pro" in ".ProIts cool", but not the other "Pro" in Production or Product. There is not in any case a point before the Pro, but there should be always be a space if someone wrote something with "Pro...". I have also different separators. Here is my current code, which works fine if there is no duplicated separator in the text:

arr_seperators = c("String1", "Pro" , "Contra")
n = 3
output = rep(0,n)
for ( i in 1:n){
  output[i] =  strsplit(input_str, arr_seperators[i])[[1]][2]
  for (j in 1:n){
  output[i] =  strsplit(output[i], arr_seperators[j])[[1]][1] 
  }

}
print(output)
Chris
  • 35
  • 3

2 Answers2

2
strsplit("For example. Production of something a Product.ProIts cool", 
         "(?<!\\s)Pro", perl = TRUE)
# [[1]]
# [1] "For example. Production of something a Product." "Its cool"                                       

The (?<!\\s) is using regex lookaround, supported when using perl-compatible regexes (perl=TRUE).

(?<=...) is positive lookbehind; (?<!...) means negative lookbehind, aka not preceded by; and \\s is "whitespace". The premise of lookaround in general is to match when there is something before/after your pattern but not to consume that preceding/following text within the captured substring.

We can also use positive lookbehind with (?<=\\S) for non-whitespace.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks for the clarity, clipboard-confusion can be ... interesting. I'm not perfectly certain if this matches the OP's needs, as the sample code doesn't produce (to me) meaningful output and expected output is not explicitly provided. We'll see, thanks. @Onyambu – r2evans Jun 19 '20 at 18:34
  • 1
    Just noticed the question title and the explanation are contradictory. What you have provided satisfies the title given – Onyambu Jun 19 '20 at 18:36
  • 1
    Thank you, this helped me a lot. Especially the explanation, It seems to work and i will test it in the evening using my whole dataset. – Chris Jun 19 '20 at 19:12
0

Maybe you are looking for something like this? If not, plese add desired output..

#split after the delimiter and keep it
base::strsplit( "For example. Production of something a Product.ProIts cool",
                      split = "(?<=.)(?=\\.Pro)",
                      perl = TRUE )

[[1]]
[1] "For example. Production of something a Product" ".ProIts cool" 
Wimpel
  • 26,031
  • 1
  • 20
  • 37