1

readr package has a function called parse_number that returns the numbers in a string:

readr::parse_number("Hello 2022!")

[1] 2022

Is there a similar method for returning a date from a string? The readr has a function called parse_date but it does something different:

readr::parse_date("X2018-01-11_poland")

Warning: 1 parsing failure.
row col   expected             actual
  1  -- date like  X2018-01-11_poland

[1] NA

Desired output:

# the raw string is "X2018-01-11_poland"
2018-01-11

P.S. I am not interested in doing this with a regular expression.

bird
  • 2,938
  • 1
  • 6
  • 27
  • 1
    Just specify the `format`, according to the [`strptime()`](https://rdrr.io/r/base/strptime.html) convention: `readr::parse_date("X2018-01-11_poland", format = "X%Y-%m-%d_poland")`. – Greg Dec 22 '21 at 14:32
  • 1
    What exactly is your aversion to using a regex for this? A well-crafted regex can be robust to most regex problems, and provide fairly decent date-like extraction results both in performance and resilience. – r2evans Dec 22 '21 at 15:05

4 Answers4

5

The lubridate package has parse_date_time2 which is easy to use.

library(lubridate)
dstring <- "X2018-01-11_poland"
date <- parse_date_time2(dstring, orders='Ymd')
date
#[1] "2018-01-11 UTC"
Andrew Chisholm
  • 6,362
  • 2
  • 22
  • 41
4

Here is a regex free idea,

parse_date(strsplit(x, '_', fixed = TRUE)[[1]][1], format = 'X%Y-%m-%d')
#[1] "2018-01-11"

However, IF the poland part is also fixed, you can again do,

parse_date(x, format = 'X%Y-%m-%d_poland')
#[1] "2018-01-11"
Sotos
  • 51,121
  • 6
  • 32
  • 66
3

1) This uses only base R and does not use any regular expressions. It assumes that (1) there are only letters and spaces before the date as that is the case in the question but that could easily be relaxed, if necessary, by adding additional characters to lets and (2) the date is in standard Date format. chartr translates the ith character in its first argument to the ith character in its second replacing each letter with a space. Then use as.Date. Note that as.Date ignores junk at the end so it is ok if additional characters not in lets follow the date.

x <- "X2018-01-11_poland"

lets <- paste(letters, collapse = "")
as.Date(chartr(lets, strrep(" ", nchar(lets)), tolower(x)))
## [1] "2018-01-11"

2) If we knew that the string always starts with X and the Date appears right after it then we can just specify the prefix in the as.Date format string. It also does not use any regular expressions and only uses base R.

as.Date(x, "X%Y-%m-%d")
## [1] "2018-01-11"

3) If you are willing to compromise and use a very simple regular expression -- here \D matches any non-digit and backslashes must be doubled within quotes. gsub removes any such character.

as.Date(gsub("\\D", "", x), "%Y%m%d")
## [1] "2018-01-11"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

Possible alternatives using base R, or stringr and lubridate

as.Date(substr("X2018-01-11_poland", 2, 11), format = "%Y-%m-%d")
#> [1] "2018-01-11"

library(stringr)
library(lubridate)

ymd(str_sub("X2018-01-11_poland", 2, 11))
#> [1] "2018-01-11"

Created on 2021-12-22 by the reprex package (v2.0.1)

Peter
  • 11,500
  • 5
  • 21
  • 31