7

Say I have the following string:

pos/S881.LMG1810.QE009562.mzML

And wish to select the beginning from that string:

pos/S881.

I can use the following regex expression to get the start of the string (^), then any character (.), any number of time (*), ending with a decimal point (\.)

^.*\.

However this terminates at the last decimal in the string and thus gives me:

pos/S881.LMG1810.QE009562.

How do I terminate the selection at the first decimal point?

Henry Holm
  • 495
  • 3
  • 13

5 Answers5

7

We can use a regex lookaround ((?<=\\.)) to match the characters that succeed after the . and remove those with trimws

trimws(str1, whitespace = "(?<=\\.).*")
[1] "pos/S881."

Or extract the characters from the start (^) of the string that are not a . ([^.]+) followed by a dot (metacharacter, thus escaped)

library(stringr)
str_extract(str1, "^[^.]+\\.")
[1] "pos/S881."

data

str1 <- "pos/S881.LMG1810.QE009562.mzML"
akrun
  • 874,273
  • 37
  • 540
  • 662
7

Alternatively just use sub():

s <- 'pos/S881.LMG1810.QE009562.mzML'
sub("\\..*", ".", s)
# [1] "pos/S881."
  • \\..* - Match a literal dot followed by 0+ characters.
JvdV
  • 70,606
  • 8
  • 39
  • 70
4

We could use strsplit:

With strsplit function and indexing we extract the desired part of the string:

strsplit(x, "\\.")[[1]][1]  
[1] "pos/S881"
TarJae
  • 72,363
  • 6
  • 19
  • 66
3

Accepting @akrun answer for their quick response but found that the "?" modifier makes "*" non greedy in my original expression as written.

stringr::str_extract("pos/S881.LMG1810.QE009562.mzML", "^.*?\\.")
[1] "pos/S881."
Henry Holm
  • 495
  • 3
  • 13
3

Another regexp approach is using sub along with the pattern "(^.*?\\.).*" , e.g.,

> sub("(^.*?\\.).*", "\\1", "pos/S881.LMG1810.QE009562.mzML")
[1] "pos/S881."
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81