0

I have a vector with time information and a date noted only once per day. I need to convert the vector into a usable format such as POSIXlt. The times are ordered, where all times (%H:%M) within a day belong to the last date noted before the date-less time.

t <- structure(c(6L, 1L, 2L, 3L, 4L, 5L, 10L, 7L, 8L, 9L), 
    .Label = c("00:15", "00:25", "00:35", "00:45", "02:05", "20.01.2013; 0:05", 
    "20:48", "20:58", "21:08", "25.01.2013; 20:38"), class = "factor")

From multiple previous answers to questions about factor to date conversion (e.g. here), I know how to convert t[c(1, 7)].

t1 <- strptime(as.character(t[c(1, 7)]), format = "%d.%m.%Y; %H:%M")
# t1
# [1] "2013-01-20 00:05:00 CET" "2013-01-25 20:38:00 CET"

However, how can I propagate the missing date for the remaining values so that they would convert correctly?

Community
  • 1
  • 1
nya
  • 2,138
  • 15
  • 29

2 Answers2

2
library(zoo)  # For the na.locf function

df = data.frame(date=t)

# Put date and time in separate columns
df$time = gsub(".*; (.*)","\\1", df$date)
df$date = as.Date(df$date, format="%d.%m.%Y")

# Fill missing df$date values
df$date = na.locf(df$date)

# Convert to POSIXct
df$date = as.POSIXct(paste(df$date, df$time))
df = df[,1, drop=FALSE]

df

                  date
1  2013-01-20 00:05:00
2  2013-01-20 00:15:00
3  2013-01-20 00:25:00
4  2013-01-20 00:35:00
5  2013-01-20 00:45:00
6  2013-01-20 02:05:00
7  2013-01-25 20:38:00
8  2013-01-25 20:48:00
9  2013-01-25 20:58:00
10 2013-01-25 21:08:00
eipi10
  • 91,525
  • 24
  • 209
  • 285
1

We can use dplyr

library(dplyr)
data.frame(t) %>%
     mutate(Date = as.Date(t, "%d.%m.%Y")) %>% 
     group_by(grp = cumsum(!is.na(Date))) %>%
     mutate(Date = Date[1L],
            DateTime = as.POSIXct(paste(Date, sub(".*;", "", t)))) %>% 
     ungroup() %>%
     select(DateTime)  
#           DateTime
#                <dttm>
#1  2013-01-20 00:05:00
#2  2013-01-20 00:15:00
#3  2013-01-20 00:25:00
#4  2013-01-20 00:35:00
#5  2013-01-20 00:45:00
#6  2013-01-20 02:05:00
#7  2013-01-25 20:38:00
#8  2013-01-25 20:48:00
#9  2013-01-25 20:58:00
#10 2013-01-25 21:08:00

Or using base R

i1 <- nchar(as.character(t))==5
v1 <- ifelse(i1, paste(sub(";.*", ";", t[!i1])[cumsum(!i1)], 
                     sub(".*;\\s+", "", t[i1])), as.character(t))

strptime(v1, "%d.%m.%Y %H:%M")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Unfortunately, your code added a date to each row. I have a variable number of rows belonging to a single date. – nya Dec 02 '16 at 16:53
  • @nya Can you post a better example that reflects the problem. – akrun Dec 02 '16 at 16:54
  • Both answers work for me, but I'm accepting eipi10's, because it is easier for me to understand. Thank you for your alternative. – nya Dec 02 '16 at 17:08