0

I have a data table with about 200 million rows. There are four columns:

  • ID of individual
  • date (as character)
  • half hour (as an index of 1:48)
  • data

I am trying to create a single column that combines time and date.

So far I haven't been able to convert the half hour index into a time variable. I have run the following code:

thirtymins <- hms("00:30"00") #from lubridate
dataset$halfhour <- dataset$halfhour * thirtymins

When I run this code it crashes the computer.

When I only run for the first million rows (0.5%) it takes 1.75 minutes, so it would take about 6 hours to run for the whole data set.

Is there a better way?

EDIT: (sorry wasn't sure how to do it in a comment)

Output from dput(head(dataset)).

structure(list(ID = c("1", "1", "1", 
"1", "1", "1"), date = c("2016-07-01", 
"2016-07-01", "2016-07-01", "2016-07-01", "2016-07-01", "2016-07-01"
), halfhour = 1:6, data = c(6.8, 4.82, 4.4, 4.36, 
4.4, 4.02)), class = c("data.table", "data.frame"), row.names = c(NA, 
-6L), .internal.selfref = <pointer: 0x00000000025a1ef0>)
Novo88
  • 109
  • 1
  • 7
  • 5
    If you could provide a sample of your data, it would go a long way to helping us "know" a little more confidently what your data looks like. Can you please post the output from `dput(head(x))` into your question? (Not a comment, please!) – r2evans Apr 02 '19 at 03:20
  • 2
    That seems *very* slow. I can process 1 million rows in 0.5 seconds using the above code. The issue might have more to do with having a 200M row dataset lying around in memory. – thelatemail Apr 02 '19 at 03:34
  • 1
    Try `as.Date(df$date) + lubridate::minutes(30) * df$halfhour`, this works but not sure if this is any faster. – Ronak Shah Apr 02 '19 at 03:40
  • That is it! Sorry - total rookie error there. When I remove everything but the half hour column it completes very quickly. – Novo88 Apr 02 '19 at 03:42
  • This question has some performance tips as well: https://stackoverflow.com/questions/46078151/efficiently-convert-a-date-column-in-data-table – Chase Apr 02 '19 at 03:45

0 Answers0