I am trying to extract a list of POSIXct
login times from a large .csv (~11m rows), then use the cut
function to tabulate the number of logins per 15-minute block.
Given the size of the dataset, I am using the data.table
function. I have managed to achieve my objective, however I have run into some problems described below:
#selective fread
dt <- fread("foo.csv", colClasses=list(NULL=c(1:5,8:14), "POSIXct"=c(5,6)) )
Issue: I tried to store the 2 relevant columns as POSIXct classes but it appears to be stored as a character
class instead:
> class(dt$login_datetime)
[1] "character"
I managed to run the rest of my code by using as.POSIXct
as shown below:
timeLog <- dt[,1, with=FALSE]
timeLog<- timeLog[,login_datetime:=as.POSIXct(login_datetime)]
tabulate <- data.frame(table(cut(timeLog, breaks="15 mins")))
However, the second line takes about 12 minutes to run on my machine. I need to process more datasets in a similar fashion, and while 12 minutes is not devastatingly slow I am curious as to whether I can speed up this process (short of hardware upgrades).
Specifically, I tried to get fread
to store the relevant columns as POSIXct
classes directly and was unable to. I was unable to find anything regarding POSIXct in the data.table vignette here.
Would anyone be able to tell me if 1) I am doing something wrong regarding fread
and colClasses="POSIXct"
, or 2) if there are other ways/packages to speed up conversion of a data.table
column to POSIXct?
Thanks.