-1

I have strings such as the following:

2 - 5-< 2
6 - 10-< 2
6 - 10-2 - 5
> 15-2 - 5

I want to split those string just in the point where the - is neither preceded nor followed by blank space. Therefore, the strings above would get split as follows:

"2 - 5" "< 2"
"6 - 10" "< 2"
"6 - 10" "2 - 5"
"> 15" "2 - 5"

In R Studio I have tried using sub() and strsplit() but I have found hard to set the right regex expression. Does anyone has a clue?

r2evans
  • 141,215
  • 6
  • 77
  • 149
mbistato
  • 139
  • 1
  • 7

2 Answers2

3

Use perl=TRUE with lookaround:

vec <- c("2 - 5-< 2", "6 - 10-< 2", "6 - 10-2 - 5", "> 15-2 - 5")
strsplit(vec, "(?<! )-(?!= )", perl=TRUE)
# [[1]]
# [1] "2 - 5" "< 2"  
# [[2]]
# [1] "6 - 10" "< 2"   
# [[3]]
# [1] "6 - 10" "2 - 5" 
# [[4]]
# [1] "> 15"  "2 - 5"
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

I guess this is an easier-to-understand solution:

library(stringr)
str_split(vec, "(?<=\\d)-(?=\\d)")
[[1]]
[1] "2 - 5" "< 2"  

[[2]]
[1] "6 - 10" "< 2"   

[[3]]
[1] "6 - 10" "2 - 5" 

[[4]]
[1] "> 15"  "2 - 5"

First off, no perl = TRUE needed (well, but a new package, stringr). But then, (?<=\\d) and (?=\\d) are positive lookarounds, which are inherently easier to process. The first means: if you see a digit on the left ...; the second says, if you see a digit on the right ... And str_split(with the underscore) says, if these two conditions are met, then split on the dash -.

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34