Select a string ending at the first instance of character in Regular Expressions

Question

Say I have the following string:

pos/S881.LMG1810.QE009562.mzML

And wish to select the beginning from that string:

pos/S881.

I can use the following regex expression to get the start of the string (^), then any character (.), any number of time (*), ending with a decimal point (\.)

^.*\.

However this terminates at the last decimal in the string and thus gives me:

pos/S881.LMG1810.QE009562.

How do I terminate the selection at the first decimal point?

akrun · Accepted Answer · 2022-10-13T17:01:42.067

We can use a regex lookaround ((?<=\\.)) to match the characters that succeed after the . and remove those with trimws

trimws(str1, whitespace = "(?<=\\.).*")
[1] "pos/S881."

Or extract the characters from the start (^) of the string that are not a . ([^.]+) followed by a dot (metacharacter, thus escaped)

library(stringr)
str_extract(str1, "^[^.]+\\.")
[1] "pos/S881."

data

str1 <- "pos/S881.LMG1810.QE009562.mzML"

score 7 · Answer 2 · answered Oct 13 '22 at 17:06

7

Alternatively just use sub():

s <- 'pos/S881.LMG1810.QE009562.mzML'
sub("\\..*", ".", s)
# [1] "pos/S881."

\\..* - Match a literal dot followed by 0+ characters.

answered Oct 13 '22 at 17:06

JvdV

70,606
8
39
70

score 4 · Answer 3 · answered Oct 13 '22 at 17:00

4

We could use strsplit:

With strsplit function and indexing we extract the desired part of the string:

strsplit(x, "\\.")[[1]][1]

[1] "pos/S881"

answered Oct 13 '22 at 17:00

TarJae

72,363
6
19
66

score 3 · Answer 4 · answered Oct 13 '22 at 16:59

3

Accepting @akrun answer for their quick response but found that the "?" modifier makes "*" non greedy in my original expression as written.

stringr::str_extract("pos/S881.LMG1810.QE009562.mzML", "^.*?\\.")
[1] "pos/S881."

answered Oct 13 '22 at 16:59

Henry Holm

495
3
13

score 3 · Answer 5 · answered Oct 14 '22 at 11:20

3

Another regexp approach is using sub along with the pattern "(^.*?\\.).*" , e.g.,

> sub("(^.*?\\.).*", "\\1", "pos/S881.LMG1810.QE009562.mzML")
[1] "pos/S881."

answered Oct 14 '22 at 11:20

ThomasIsCoding

96,636
9
24
81

Select a string ending at the first instance of character in Regular Expressions

5 Answers5

data