I'm reading in data from another platform where a combination of the strings listed below is used for expressing timestamps:
\* = current time
t = current day (00:00)
mo = month
d = days
h = hours
m = minutes
For example, *-3d
is current time minus 3 days, t-3h
is three hours before today morning (midnight yesterday).
I'd like to be able to ingest these equations into R and get the corresponding POSIXct
value. I'm trying using regex in the below function but lose the numeric multiplier for each string:
strTimeConverter <- function(z){
ret <- stringi::stri_replace_all_regex(
str = z,
pattern = c('^\\*',
'^t',
'([[:digit:]]{1,})mo',
'([[:digit:]]{1,})d',
'([[:digit:]]{1,})h',
'([[:digit:]]{1,})m'),
replacement = c('Sys.time()',
'Sys.Date()',
'*lubridate::months(1)',
'*lubridate::days(1)',
'*lubridate::hours(1)',
'*lubridate::minutes(1)'),
vectorize_all = F
)
return(ret)
# return(eval(expr = parse(text = ret)))
}
> strTimeConverter('*-5mo+3d+4h+2m')
[1] "Sys.time()-*lubridate::months(1)+*lubridate::days(1)+*lubridate::hours(1)+*lubridate::minutes(1)"
> strTimeConverter('t-5mo+3d+4h+2m')
[1] "Sys.Date()-*lubridate::months(1)+*lubridate::days(1)+*lubridate::hours(1)+*lubridate::minutes(1)"
Expected output:
# *-5mo+3d+4h+2m
"Sys.time()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+4*lubridate::minutes(1)"
# t-5mo+3d+4h+2m
"Sys.Date()-5*lubridate::months(1)+3*lubridate::days(1)+4*lubridate::hours(1)+4*lubridate::minutes(1)"
I assumed that wrapping the [[:digit]]{1,}
in parentheses ()
would preserve them but clearly that's not working. I defined the pattern like this else the code replaces repeat occurrences e.g. *
gets converted to Sys.time()
but then the m
in Sys.time()
gets replaced with *lubridate::minutes(1)
.
I plan on converting the (expected) output to R date-time using eval(parse(text = ...))
- currently commented out in the function.
I'm open to using other packages or approach.
Update
After tinkering around for a bit, I found the below version works - I'm replacing strings in the order such that newly replaced characters are not replaced again:
strTimeConverter <- function(z){
ret <- stringi::stri_replace_all_regex(
str = z,
pattern = c('y', 'd', 'h', 'mo', 'm', '^t', '^\\*'),
replacement = c('*years(1)',
'*days(1)',
'*hours(1)',
'*days(30)',
'*minutes(1)',
'Sys.Date()',
'Sys.time()'),
vectorize_all = F
)
ret <- gsub(pattern = '\\*', replacement = '*lubridate::', x = ret)
rdate <- (eval(expr = parse(text = ret)))
attr(rdate, 'tzone') <- 'UTC'
return(rdate)
}
sample_string <- '*-5mo+3d+4h+2m'
strTimeConverter(sample_string)
This works but is not very elegant and will likely fail as I'm forced to incorporate other expressions (e.g. yd
for day of the year e.g. 124).