I have a vector of strings of the form "letters numbers", I want to extract the numbers using RegEx implemented in stringr::str_extract
with pattern "\\d*"
. The results are very confusing:
# R 4.2.3
# install.packages('stringr')
library(stringr)
# case 1
str_extract('word 42', '\\d*')
# ""
# case 2 (?)
str_extract('42 word', '\\d*')
# "42"
# case 3
str_extract('word 42', '\\d+')
# "42"
# case 4 (?!)
str_extract('word 42', '\\d*$')
# "42"
# case 5
str_extract('42 word', '\\d*$')
# ""
In all the cases the expected result is "42"
.
I am a novice with RegEx's, but the pattern = '\\d*'
seems pretty straightforward - I understand it as "match any number of consecutive numeric characters".
The fact that it doesn't work for case 1 but does for case 2 is quite counterintuitive by itself. And then the roles seem to be reversed when using pattern = '\\d*$'
(cases 4 and 5).
I have experimented more with other functions (str_match
and str_match_all
), but the results where still not clear.
I couldn't find such a specific thing elsewhere, so I hoped more experienced R/RegEx users could provide a clarification on what is going on under the hood.