8

I would like to use str_extract in the stringr package to extract the numbers from strings in the form XX nights etcetc.

I'm currently doing this:

library(stringr)

str_extract("17 nights$5 Days", "(\\d)+ nights")

but that returns

"17 nights"

instead of 17.

How can I extract just the number? I thought specifying the extract group with parentheses would work, but it doesn't.

Harry M
  • 1,848
  • 3
  • 21
  • 37

5 Answers5

12

You can use the look ahead regular express (?=)

library(stringr)

str_extract("17 nights$5 Days", "(\\d)+(?= nights)")

(\d) - a digit
(\d)+ - one or more digits
(?= nights) - that comes in front of " nights"

The look behind (?<=) can also come in handy.

A good reference cheatsheet is from Rstudio's website: https://raw.githubusercontent.com/rstudio/cheatsheets/main/regex.pdf

Dave2e
  • 22,192
  • 18
  • 42
  • 50
5

In base R, we can use sub to extract number which comes before "nights"

as.integer(sub("(\\d+)\\s+nights.*", "\\1","17 nights$5 Days"))
#[1] 17

Or if the number is always the first number in the string we can use readr::parse_number

readr::parse_number("17 nights$5 Days")
#[1] 17
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    I just ran a check and using base R `as.integer(sub("(\\d+)\\s+nights.*", "\\1","17 nights$5 Days"))` (even with the `as.integer`) is about 3X faster than the equivalent `stringr` – Moohan Aug 12 '19 at 15:46
  • 1
    Thanks for for readr::parse_number -- this is quite foolproof – wint3rschlaefer Mar 01 '20 at 11:43
4

If you want to specify a specific group for return, use str_replace(). The pattern you want to capture is wrapped in (), then in the replacement argument you refer to that group as "\\1" as it is capture group number one.

I added the ^ to indicate you want numbers only at the beginning of the string.


library(stringer)

str_replace(string = "17 nights$5 Days",
            pattern = "(^\\d+).*",
            replacement = "\\1")

giving:

[1] "17"

2

You can use stringr::str_match which returns all of the matched groups as a matrix then select the correct column.

library(stringr)

str_match("17 nights$5 Days", "(\\d+?) nights")[[2]]
Moohan
  • 933
  • 1
  • 9
  • 27
0

Using rebus. If the string always start with a number:

library(stringr)
library(rebus)

pattern = START %R% one_or_more(DGT)
str_extract("17 nights$5 Days", pattern)
#> [1] "17"

Created on 2021-05-30 by the reprex package (v2.0.0)

jpdugo17
  • 6,816
  • 2
  • 11
  • 23