0

Assuming that I have a table of strings:

df <-tibble::tribble(
 ~ alternatives,
" 23.32 | x232 code | This is a description| 43.11 | a341 code | some other description | optimised | v333 code | still another description" )

I would like to split the string in the locations preceding numeric values: eg. before 23.32, before 43.11, and before the word "optimized".

It is expected that I achieve in each cell the vector:

c(23.32 | x232 code | This is a description|, 43.11 | a341 code | some other description |,  optimised | v333 code | still another description)

What should be the regex pattern to achieve the split before specific patterns? The number of pipe characters between the patterns concerned may differ, I cannot use them reliably. I am vaguely aware of look-ahead etc. This code will not return what I expect but I believe I am looking for a similar solution (this will not do what I want):

df2 <- 
  df %>% 
  mutate(alternatives = 
           str_split(alternatives, 
                     pattern = "(?<=[a-zA-Z])\\s*(?=[0-9])"))
enter code here

What would be the solution?

Jacek Kotowski
  • 620
  • 16
  • 49
  • 1
    Try `"(?<=\\|)\\s*(?=\\d|optimised\\b)"` – Wiktor Stribiżew Apr 27 '21 at 15:41
  • You need to know exactly what numbers you can get. For example, can they include an exponent? Also, why do you split before "optimized" when there's no number there? Or is it "split before numbers, and also before the word 'optimized'"? – dash2 Apr 27 '21 at 15:44
  • 1
    @dash2 The parts of a vector start either with a numeral like 12.34 (it could be a price) or with a keyword "optimized" – Jacek Kotowski Apr 27 '21 at 19:18

1 Answers1

2

You may try splitting on the following regex pattern:

(?<=\S)\s+(?=(?:\d+\.\d+|optimised)\b)

Demo

Updated script:

df2 <- df %>% 
    mutate(alternatives = 
        str_split(alternatives, 
                  pattern = "(?<=\\S)\\s+(?=(?:\\d+\\.\\d+|optimised)\\b)"))
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Excellent, thanks. What book do you recommend to learn that? – Jacek Kotowski Apr 27 '21 at 16:13
  • 1
    Regex is really almost like a language, you need to use it in order to improve at a visceral level. That will happen naturally assuming regex is part of your tech stack. If it's not, then maybe practice it on your own. – Tim Biegeleisen Apr 27 '21 at 16:22