0

I have a character string with a hyphen, and I'd like to extract the characters, which happen to be numbers, after then hyphen; however, the number of characters after the hyphen can vary depending on an input I give. Sometimes the characters after the hyphen are length1, length 2, length 3, or even sometimes length 4.

I found this stackoverflow post giving an example of how to extract after a hyphen using 4 digits, but if the code is modified to a different number of digits, the result will not work correctly. Here's the original post:

Extracting first four digits after a hyphen using stringr

The example code that was the solution to extract the first 4 digits after the string is this:

stringr::str_extract(
  "extract_public_2018_20190530180949469_58906_20110101-20111231Texas", 
  "(?<=-)[[:digit:]]{4}"
)

If the code is modified so that there are only 2 characters after the hyphen but leaving it to select 3 digits as follows, the result is NA because there are not 3 digits after the hyphen, only 2.

stringr::str_extract(
  "extract_public_2018_20190530180949469_58906_20110101-20", 
  "(?<=-)[[:digit:]]{3}"
)

Another issue is if the code is modified again so that there are 3 digits after the hyphen and then specifying the digits to select as 2, it will only return the first 2 digits, but there are actually 3 digits after the hyphen. See below:

stringr::str_extract(
  "extract_public_2018_20190530180949469_58906_20110101-201", 
  "(?<=-)[[:digit:]]{2}"
)

For my purposes, I always have a string with numbers before a hyphen and numbers after. Those numbers change change depending on an input I specify. The strings could be, for example:

0-5, 15-30, 30-60, 90-120

All I care about is extracting the complete number after the hyphen, regardless of what length it may be. The example from the post allows extraction but requires specifying a fixed length. My analyses will produce a number after the hyphen with varying lengths.

How can the example code from the post be modified (or is there alternative code) so that I don't have to select a fixed length of digits? I'd like to have code that pulls the full number regardless of how many digits may appear after the hyphen.

EastBeast
  • 89
  • 7
  • 1
    Replace `{2}` with `{1,}`, which specifies at least one occurrence. If you have an upper limit of occurrences, put that after the comma. e.g. `{1,4}` matches one to four occurrences – benson23 Jun 08 '23 at 16:21
  • 1
    Excellent! Problem solved. Thank you so much! Still learning regex, and it is a head spinner. Thanks again! – EastBeast Jun 08 '23 at 16:36

1 Answers1

0

Another solution might be to use the [[:digit:]]+ expression to indicate that you want to extract at least one number following the hyphen. Thus, if there is a single digit, it will get extracted, as well as any higher number of digits.

stringr::str_extract(
  c("extract_public_2018_20190530180949469_58906_20110101-20111231Texas",
    "extract_public_2018_20190530180949469_58906_20110101-20",
    "extract_public_2018_20190530180949469_58906_20110101-206548"), 
  "(?<=-)[[:digit:]]+"
)

# "20111231" "20" "206548"