Efficient coding to make time increments 'finer' from minutes to seconds

Question

I have a time series data with 1 minute increments. I have written a code but with the large amount of data I have (over 1M rows), looping through each line is taking way too long. The data looks something like the below:

t0 = as.POSIXlt("2018-12-23 00:01:00")
t0 = t0+seq(60,60*10,60)
p1 = seq(5,5*10,5)
p2 = seq(7,7*10,7)
m0 = cbind(p1,p2)
rownames(m0) = as.character(t0)

Where it looks something like:

> head(m0)
                    p1 p2
2018-12-23 00:02:00  5  7
2018-12-23 00:03:00 10 14
2018-12-23 00:04:00 15 21
2018-12-23 00:05:00 20 28
2018-12-23 00:06:00 25 35
2018-12-23 00:07:00 30 42

I want to turn this data into 5 seconds increments by adding 11 lines (55 seconds) before each minute with the value carrying over from the latest value. So it would something like:

> new0
                    p1 p2
2018-12-23 00:01:05  5  7
2018-12-23 00:01:10  5  7
2018-12-23 00:01:15  5  7
2018-12-23 00:01:20  5  7
2018-12-23 00:01:25  5  7
2018-12-23 00:01:30  5  7
2018-12-23 00:01:35  5  7
2018-12-23 00:01:40  5  7
2018-12-23 00:01:45  5  7
2018-12-23 00:01:50  5  7
2018-12-23 00:01:55  5  7
2018-12-23 00:02:00  5  7
2018-12-23 00:02:05 10 14
2018-12-23 00:02:10 10 14
2018-12-23 00:02:15 10 14
2018-12-23 00:02:20 10 14
2018-12-23 00:02:25 10 14
2018-12-23 00:02:30 10 14
2018-12-23 00:02:35 10 14
2018-12-23 00:02:40 10 14
2018-12-23 00:02:45 10 14
2018-12-23 00:02:50 10 14
2018-12-23 00:02:55 10 14
2018-12-23 00:03:00 10 14

I am hoping to find some way to do it without using a loop and utilizing the efficient codes in xts and/or data.table which I am not too familiar with.

I tried using the ave function from base R, but it is not fast enough.

Apart from the question how to achieve this: How about efficiency? You say, looping through each line takes too long as you already have more than 1M rows. Now adding 11 rows per row results in like 12M rows. Hence, wouldn't it be more wise to have the code "using" this data to "fake" the additional 11 rows when actually using it or alter that code altogether? I mean - by adding the 11 lines, you dont *create* any data. You just bloat it. — Xenonite, Jan 22 '19 at 10:47

score 9 · Answer 1 · answered Dec 24 '18 at 18:06

Since you tagged this with data.table:

library(data.table)
dt = as.data.table(m0, keep = T)[, rn := as.POSIXct(rn)]

dt[.(rep(rn, each = 12) - seq(0, 55, 5)), on = 'rn', roll = -Inf][order(rn)]
#                      rn p1 p2
#  1: 2018-12-23 00:01:05  5  7
#  2: 2018-12-23 00:01:10  5  7
#  3: 2018-12-23 00:01:15  5  7
#  4: 2018-12-23 00:01:20  5  7
#  5: 2018-12-23 00:01:25  5  7
# ---                          
#116: 2018-12-23 00:10:40 50 70
#117: 2018-12-23 00:10:45 50 70
#118: 2018-12-23 00:10:50 50 70
#119: 2018-12-23 00:10:55 50 70
#120: 2018-12-23 00:11:00 50 70

I should really learn and use data.table thank you this is so powerful — jay2020, Dec 24 '18 at 19:53

jdobres · Answer 2 · 2018-12-30T03:20:25.650

Here's one way to do it in base R. First, convert your data to a data frame with an explicit column for the time stamps:

m0 <- as.data.frame(m0)
m0$t <- t0

   p1 p2                   t
1   5  7 2018-12-23 00:02:00
2  10 14 2018-12-23 00:03:00
3  15 21 2018-12-23 00:04:00
4  20 28 2018-12-23 00:05:00
5  25 35 2018-12-23 00:06:00
6  30 42 2018-12-23 00:07:00
7  35 49 2018-12-23 00:08:00
8  40 56 2018-12-23 00:09:00
9  45 63 2018-12-23 00:10:00
10 50 70 2018-12-23 00:11:00

Then merge this data frame with a 1-column data frame of time differences (0 to 55):

m1 <- merge(m0, data.frame(diff = seq(0, 55, 5)))

And finally, subtract the difference column from the timestamp column to create new values:

m1$t2 <- with(m1, t - diff)

> m1[c(1, 20, 40), ]

   p1 p2                   t diff                  t2
1   5  7 2018-12-23 00:02:00    0 2018-12-23 00:02:00
20 50 70 2018-12-23 00:11:00    5 2018-12-23 00:10:55
40 50 70 2018-12-23 00:11:00   15 2018-12-23 00:10:45

Sorry but it does not produce the desired results. Thank you for your input though. — jay2020, Dec 24 '18 at 17:44
@jay2020 This is the same result as the other one except for column names, ordering, and assigning a local time zone. It also fixes the issue you mention in your comments on the other one. — IceCreamToucan, Dec 24 '18 at 18:03

score 4 · Answer 3 · edited Dec 25 '18 at 09:48

4

A combination of lubridate, padr and tidyr will get you there. I use lubridate to format the date so it plays nice with padr. padr adds missing date time values to a data frame. Finally using tidyr's fill function to fill the empty values. Note that by default padr has a break on 1 million rows for memory protection, but you can set this value higher.

library(lubridate)
library(padr)
library(tidyr)

df1 <- data.frame(ymd_hms(t0), p1, p2)
df1 <- pad(df1, interval = "5 secs", start_val = lubridate::ymd_hms("2018-12-23 00:01:05"))
df1 <- fill(df1, p1, p2, .direction = "up")

head(df1, 15)
                    t0 p1 p2
1  2018-12-23 00:01:05  5  7
2  2018-12-23 00:01:10  5  7
3  2018-12-23 00:01:15  5  7
4  2018-12-23 00:01:20  5  7
5  2018-12-23 00:01:25  5  7
6  2018-12-23 00:01:30  5  7
7  2018-12-23 00:01:35  5  7
8  2018-12-23 00:01:40  5  7
9  2018-12-23 00:01:45  5  7
10 2018-12-23 00:01:50  5  7
11 2018-12-23 00:01:55  5  7
12 2018-12-23 00:02:00  5  7
13 2018-12-23 00:02:05 10 14
14 2018-12-23 00:02:10 10 14
15 2018-12-23 00:02:15 10 14

edited Dec 25 '18 at 09:48

tjebo

21,977
7
58
94

answered Dec 24 '18 at 17:36

phiver

23,048
14
44
56

Thank you so much. This is what I was looking for. However, there are missing time stamps in the original data where i don't wish to fill in. For example, you can think of it as after 2018-12-23 00:02:00, 2018-12-23 00:03:00, 2018-12-23 00:05:00, where 00:04 minutes is missing. It is missing for a reason. Is there a way to go around this problem? – jay2020 Dec 24 '18 at 17:51
So instead of generating all 5sec intervals using the pad function from the first time point, making 11 5 seconds intervals above each data point I have? – jay2020 Dec 24 '18 at 17:52
1

@jay2020, in that case the merges from jdobres or the solution from eddi, should work as they don't join on data that is not there. padr fills in all missing data. Now you could delete all values that are missing. But that might defeat the purpose of trying to speed things up. Considering speed I would say eddi's answer would be the best. – phiver Dec 24 '18 at 18:31

Darren Tsai · Answer 4 · 2018-12-24T19:03:10.347

A base way:

m0 <- as.data.frame(m0)
time <- lapply(as.POSIXct(rownames(m0)), seq, by = "-5 sec", len = 12)
m1 <- cbind(TIME = Reduce(c, time), m0[rep(seq_len(nrow(m0)), each = 12), ])
row.names(m1) <- NULL
head(m1)

#                  TIME p1 p2
# 1 2018-12-23 00:02:00  5  7
# 2 2018-12-23 00:01:55  5  7
# 3 2018-12-23 00:01:50  5  7
# 4 2018-12-23 00:01:45  5  7
# 5 2018-12-23 00:01:40  5  7
# 6 2018-12-23 00:01:35  5  7

Note: The variable TIME in output is reversed.

score 2 · Answer 5 · answered Dec 29 '18 at 13:59

Here's a general xts solution that should work for different parameters than what you have specified in your question.

# convert m0 to xts
x0 <- as.xts(m0)

# create empty xts object with observations at all time points you want
nobs <- 11
nsec <- 5
y0 <- xts(, index(x0) - rep(seq_len(nobs) * nsec, each = nrow(x0)))

# merge data with desired index observations
new0 <- merge(x0, y0)
# carry the current value backward
new0 <- na.locf(new0, fromLast = TRUE)

head(new0, 20)
#                     p1 p2
# 2018-12-23 00:01:05  5  7
# 2018-12-23 00:01:10  5  7
# 2018-12-23 00:01:15  5  7
# 2018-12-23 00:01:20  5  7
# 2018-12-23 00:01:25  5  7
# 2018-12-23 00:01:30  5  7
# 2018-12-23 00:01:35  5  7
# 2018-12-23 00:01:40  5  7
# 2018-12-23 00:01:45  5  7
# 2018-12-23 00:01:50  5  7
# 2018-12-23 00:01:55  5  7
# 2018-12-23 00:02:00  5  7
# 2018-12-23 00:02:05 10 14
# 2018-12-23 00:02:10 10 14
# 2018-12-23 00:02:15 10 14
# 2018-12-23 00:02:20 10 14
# 2018-12-23 00:02:25 10 14
# 2018-12-23 00:02:30 10 14
# 2018-12-23 00:02:35 10 14
# 2018-12-23 00:02:40 10 14

Efficient coding to make time increments 'finer' from minutes to seconds

5 Answers5