I have a data table with about 200 million rows. There are four columns:
- ID of individual
- date (as character)
- half hour (as an index of 1:48)
- data
I am trying to create a single column that combines time and date.
So far I haven't been able to convert the half hour index into a time variable. I have run the following code:
thirtymins <- hms("00:30"00") #from lubridate
dataset$halfhour <- dataset$halfhour * thirtymins
When I run this code it crashes the computer.
When I only run for the first million rows (0.5%) it takes 1.75 minutes, so it would take about 6 hours to run for the whole data set.
Is there a better way?
EDIT: (sorry wasn't sure how to do it in a comment)
Output from dput(head(dataset))
.
structure(list(ID = c("1", "1", "1",
"1", "1", "1"), date = c("2016-07-01",
"2016-07-01", "2016-07-01", "2016-07-01", "2016-07-01", "2016-07-01"
), halfhour = 1:6, data = c(6.8, 4.82, 4.4, 4.36,
4.4, 4.02)), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x00000000025a1ef0>)