3

How can I split a string which contains a number (of unknown number of digits) into two strings - the number and the rest of the string. Notice that there could be other numbers in the string which should not be affected. For example:

"abc665abc12"   -> "abc665abc", "12"
"abc665abc 182" -> "abc665abc", "182"
"abc665abc0"    -> "abc665abc", "0"

Thanks!

Frank
  • 66,179
  • 8
  • 96
  • 180
Sasha
  • 5,783
  • 8
  • 33
  • 37

5 Answers5

8

You may also use strsplit

> x = c("abc665abc12", "abc665abc 182", "abc665abc0")
> strsplit(x, "(?<=[A-Za-z])\\s*(?=\\d+$)", perl = TRUE)
[[1]]
[1] "abc665abc" "12"       

[[2]]
[1] "abc665abc" "182"      

[[3]]
[1] "abc665abc" "0"  
Pierre L
  • 28,203
  • 6
  • 47
  • 69
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
6

This works:

# op's example
x = c("abc665abc12", "abc665abc 182", "abc665abc0")

library(stringi)
res = stri_match_first_regex(x, "^(.*?) ?([0-9]+)$")


     [,1]            [,2]        [,3] 
[1,] "abc665abc12"   "abc665abc" "12" 
[2,] "abc665abc 182" "abc665abc" "182"
[3,] "abc665abc0"    "abc665abc" "0"  

Your desired parts are in columns 2 & 3, corresponding to the parentheses in the regex.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • For those seeking a way of extracting capture groups (the things in parentheses) in base, good luck. It is a real mess: http://stackoverflow.com/a/18620944/1191259 – Frank Nov 22 '15 at 04:04
3

When it comes to things like this, I like using strapply from the gsubfn package:

library(gsubfn)
strapply('abc665abc12', '(.*?) *(\\d+)$', c)[[1]]
# [1] "abc665abc" "12" 

If you have a character vector, it's the same concept:

strapply(x, '(.*?) *(\\d+)$', c)
hwnd
  • 69,796
  • 4
  • 95
  • 132
2

In base:

cbind(x,
      gsub("[ 0-9]+$", "", x), 
      gsub("^[a-z 0-9]+[a-z ]+", "", x))

     x                                
[1,] "abc665abc12"   "abc665abc" "12" 
[2,] "abc665abc 182" "abc665abc" "182"
[3,] "abc665abc0"    "abc665abc" "0" 
jeremycg
  • 24,657
  • 5
  • 63
  • 74
0

Solution using good old regex :with two of your character vectors

    x <-"abc665abc12"
    y <- "abc665abc 182"
    patterns<-"[[:digit:]]+$"
    m1 <- regexpr(patterns,x) 
    m2 <-regexpr(patterns,y)

now regmatches(x,m1) yield "12" n regmatches(y,m1) yields "182"

Bg1850
  • 3,032
  • 2
  • 16
  • 30