2

I have scraped news data, including time in form of:

time <- c("11 hours ago", "2 days ago", "3 days ago")

How can I translate this into a standard date time format? BTW: I assume that for intra-day differences (e.g. "11 hours ago") the browser recognizes my system time? Since news come from around the globe.

Thank you

Marco
  • 2,368
  • 6
  • 22
  • 48

3 Answers3

3

If you only have hours and days as units of time, then,

Sys.time() - ifelse(grepl('hours', time), 
                   as.numeric(gsub('\\D+', '', time)) * 3600, 
                               as.numeric(gsub('\\D+', '', time)) * 24 * 3600)

#[1] "2021-08-30 00:31:32 +03" "2021-08-28 11:31:32 +03" "2021-08-27 11:31:32 +03"
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Thank you, that's the easiest way that I can store it in a column of my dataframe instead handling a list in between. – Marco Aug 30 '21 at 09:05
2

You can use seq when removing the ago and adding in front a -. This will work for times given as sec, min, hour, day, DSTday, week, month, quarter or year.

lapply(sub(" ago", "", time), function(x) seq(Sys.time(), by=paste0("-", x),
 length.out = 2)[2])
#[[1]]
#[1] "2021-08-29 23:41:26 CEST"
#
#[[2]]
#[1] "2021-08-28 10:41:26 CEST"
#
#[[3]]
#[1] "2021-08-27 10:41:26 CEST"

To get a vector use c with do.call:

do.call(c, lapply(sub(" ago", "", time), function(x) seq(Sys.time(),
  by=paste0("-",x), length.out = 2)[2]))
#[1] "2021-08-30 00:11:15 CEST" "2021-08-28 11:11:15 CEST"
#[3] "2021-08-27 11:11:15 CEST"
GKi
  • 37,245
  • 2
  • 26
  • 48
  • Great option for multiple time inputs and flexible on either list or vector output. I had to transform another time format "minutes" into "min" or "mins" first. – Marco Aug 30 '21 at 11:13
  • 1
    You can try `time <- sub("minutes", "min", time)` to convert `minutes` into `min`. – GKi Aug 30 '21 at 11:22
0

a lubridate solution;

library(lubridate)
    
time <- c("11 hours ago", "2 days ago", "3 days ago")

time_numeric <- as.numeric(gsub("([0-9]+).*$", "\\1", time))
time_logical <- ifelse(grepl('hour',time),'hour',ifelse(grepl('day',time),'day','unnknown'))

time_tidy <- data.frame(diff=time_numeric,type=time_logical)

current_time <- Sys.time()

time_tidy$new <- as_datetime(ifelse(time_tidy$type=='hour',current_time-hours(time_tidy$diff),ifelse(time_tidy$type=='day',current_time-days(time_tidy$diff),current_time)))

time_tidy$new

output;

[1] "2021-08-30 00:29:21 +03"
[1] "2021-08-28 11:29:21 +03"
[1] "2021-08-27 11:29:21 +03"
Samet Sökel
  • 2,515
  • 6
  • 21
  • This is not scalable. Imagine a vector with 1000 elements and you manually having to specify `days` or `hours` – Sotos Aug 30 '21 at 08:36