Converting an index of numbers to time in R in large data set

Question

I have a data table with about 200 million rows. There are four columns:

ID of individual
date (as character)
half hour (as an index of 1:48)
data

I am trying to create a single column that combines time and date.

So far I haven't been able to convert the half hour index into a time variable. I have run the following code:

thirtymins <- hms("00:30"00") #from lubridate
dataset$halfhour <- dataset$halfhour * thirtymins

When I run this code it crashes the computer.

When I only run for the first million rows (0.5%) it takes 1.75 minutes, so it would take about 6 hours to run for the whole data set.

Is there a better way?

EDIT: (sorry wasn't sure how to do it in a comment)

Output from dput(head(dataset)).

structure(list(ID = c("1", "1", "1", 
"1", "1", "1"), date = c("2016-07-01", 
"2016-07-01", "2016-07-01", "2016-07-01", "2016-07-01", "2016-07-01"
), halfhour = 1:6, data = c(6.8, 4.82, 4.4, 4.36, 
4.4, 4.02)), class = c("data.table", "data.frame"), row.names = c(NA, 
-6L), .internal.selfref = <pointer: 0x00000000025a1ef0>)

If you could provide a sample of your data, it would go a long way to helping us "know" a little more confidently what your data looks like. Can you please post the output from `dput(head(x))` into your question? (Not a comment, please!) — r2evans, Apr 02 '19 at 03:20
That seems *very* slow. I can process 1 million rows in 0.5 seconds using the above code. The issue might have more to do with having a 200M row dataset lying around in memory. — thelatemail, Apr 02 '19 at 03:34
Try `as.Date(df$date) + lubridate::minutes(30) * df$halfhour`, this works but not sure if this is any faster. — Ronak Shah, Apr 02 '19 at 03:40
That is it! Sorry - total rookie error there. When I remove everything but the half hour column it completes very quickly. — Novo88, Apr 02 '19 at 03:42
This question has some performance tips as well: https://stackoverflow.com/questions/46078151/efficiently-convert-a-date-column-in-data-table — Chase, Apr 02 '19 at 03:45

Converting an index of numbers to time in R in large data set

0 Answers0