0

I am encountering an error in R when trying to loop over time. Here is a subset of my dataframe (containing 120000 rows).

                 time value      mean group
1 2017-01-01 12:00:00 0.507 0.5106533    NA
2 2017-01-01 12:05:00 0.526 0.5106533    NA
3 2017-01-01 12:10:00 0.489 0.5106533    NA
4 2017-01-01 12:15:00 0.598 0.5106533    NA
5 2017-01-01 12:20:00 0.564 0.5106533    NA
6 2017-01-01 12:25:00 0.536 0.5106533    NA

Lets say I want to create groups based on time period, with an expected result like this one :

                 time value      mean group
1 2017-01-01 12:00:00 0.507 0.5106533    A
2 2017-01-01 12:05:00 0.526 0.5106533    A
3 2017-01-01 12:10:00 0.489 0.5106533    B
4 2017-01-01 12:15:00 0.598 0.5106533    B
5 2017-01-01 12:20:00 0.564 0.5106533    C
6 2017-01-01 12:25:00 0.536 0.5106533    C

I tried the following code :

for (i in 1:length(merged.data$group)){
  if (merged.data[as.POSIXlt(i)$time >= "2017-05-15 12:00:00 GMT" & 
as.POSIXlt(i)$time <= "2017-05-29 12:00:00 GMT",]){
   merged.data$group == "A"} 
  else if (merged.data[as.POSIXlt(i)$time >= "2017-08-11 12:00:00" & 
as.POSIXlt(i)$time <= "2017-11-29 16:00:00",]){
    merged.data$group == "B"}
  else if (merged.data[as.POSIXlt(i)$time >= "2018-01-05 12:00:00" & 
as.POSIXlt(i)$time <= "2018-02-16 16:00:00",]){
    merged.data$group == "C"}
}

I get the following error :

Error in as.POSIXlt.numeric(i) : 'origin' must be supplied

I don't get it, I thought that POSIXlt was getting rid of origin problems ? Although, I admit that my understanding of time problems in R is a bit confuse and I have some hard time coding each times I need to deal with time/dates...

So I hope someone can help me, don't hesitate to tell me if I'm unclear or if more/better information is needed to answer my question.

Thank you by advance stackoverflowers !

Doc Martin's
  • 341
  • 1
  • 3
  • 11
  • 1
    what are the conditions for your groups? Also this `as.POSIXlt(i)$time` is not valid R code. In addition, `for i in 1:length(..)` so your `i` is integer and you are trying to convert it to time...You probably mean `as.POSIXlt$time[i]` but `as.POSIXct()` is vectorized. Plus even If you convert it to time, you are then comparing it to a character (`"2017-05-15 12:00:00 GMT"`)... – Sotos Jan 08 '19 at 14:51
  • `as.POSIXlt(i)` fails since `i` is just an integer, since you're looping from `1:length(merged.data$group)`. If you're going to call `as.POSIXlt(i)`, then you need to make sure `i` is a date. – shwan Jan 08 '19 at 14:55
  • The groups I want to create are included between dates. Thanks for your answer Sotos, you made it clear that I'm confused with those dates/time problem ! I'm gonna try to change my code and put i as a date, as suggested by shwan (thx btw). I'll correct it in my question, but still interested in suggestions. – Doc Martin's Jan 08 '19 at 15:18

2 Answers2

0

data.table approach...

sample data

library( data.table )

dt <- fread("time value mean 
2017-01-01T12:00:00 0.507 0.5106533    
2017-01-01T12:05:00 0.526 0.5106533    
2017-01-01T12:10:00 0.489 0.5106533   
2017-01-01T12:15:00 0.598 0.5106533    
2017-01-01T12:20:00 0.564 0.5106533    
2017-01-01T12:25:00 0.536 0.5106533    ", header = TRUE)

dt[, time := as.POSIXct( time, format = "%Y-%m-%dT%H:%M:%S" )]

code

library( data.table )
library( lubridate )

dt[, group := LETTERS[.GRP], by = lubridate::floor_date( time, "10 mins" ) ]

#             time value      mean group
# 1: 2017-01-01 12:00:00 0.507 0.5106533     A
# 2: 2017-01-01 12:05:00 0.526 0.5106533     A
# 3: 2017-01-01 12:10:00 0.489 0.5106533     B
# 4: 2017-01-01 12:15:00 0.598 0.5106533     B
# 5: 2017-01-01 12:20:00 0.564 0.5106533     C
# 6: 2017-01-01 12:25:00 0.536 0.5106533     C

update

approach using foverlaps, based on the provided sample data and code

library( data.table )

#create lookup-table with periods and group-names  
periods.dt <- data.table( 
  start = as.POSIXct( c( "2017-05-15 12:00:00", "2017-08-11 12:00:00", "2018-01-05 12:00:00" ), tz = "GMT" ),
  stop = as.POSIXct( c( "2017-08-11 12:00:00", "2018-01-05 12:00:00", "2018-02-16 16:00:00"), tz = "GMT" ),
  group = LETTERS[1:3] )
#set keys
setkey( periods.dt, start, stop ) 

#create sample data
dt <- fread("time value mean 
            2017-01-01T12:00:00 0.507 0.5106533    
            2017-01-01T12:05:00 0.526 0.5106533    
            2017-01-01T12:10:00 0.489 0.5106533   
            2017-01-01T12:15:00 0.598 0.5106533    
            2017-01-01T12:20:00 0.564 0.5106533    
            2017-01-01T12:25:00 0.536 0.5106533    ", header = TRUE)

dt[, time := as.POSIXct( time, format = "%Y-%m-%dT%H:%M:%S", tz = "GMT" )]

#create dummies to join on
dt[, `:=`( start = time, stop = time )]

#perform overlap join, no match --> NA
foverlaps( dt, periods.dt, type = "within", nomatch = NA)[, c("time", "value","mean","group"), with = FALSE]
#                   time value      mean group
# 1: 2017-01-01 12:00:00 0.507 0.5106533  <NA>
# 2: 2017-01-01 12:05:00 0.526 0.5106533  <NA>
# 3: 2017-01-01 12:10:00 0.489 0.5106533  <NA>
# 4: 2017-01-01 12:15:00 0.598 0.5106533  <NA>
# 5: 2017-01-01 12:20:00 0.564 0.5106533  <NA>
# 6: 2017-01-01 12:25:00 0.536 0.5106533  <NA>
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Thank you very much, this is really nice but doesn't correspond exactly to what i want, but perhaps i wasn't clear enough with my example. My dataframe is relatively huge (12000 rows), and I need my groups to be determined with dates, not a time period, cause it corresponds to specific start/stop measurements of probes. Am I clearer ? (sorry always difficult to simply explain something strange you're doing^^). – Doc Martin's Jan 08 '19 at 15:21
  • @DocMartin's No, it is not clear to me how you define your groups... With groups in a separate table, `data.table::foverlaps()` is probably the way to go. De dates in the for-loop of your questions are all waaaay outside your sample data. – Wimpel Jan 08 '19 at 15:28
  • Thank you for your quick answer and your coding. It took me a some time but I understand the code, and it works like a charm with your sample data. I do have a problem with my own data, getting the following message when I try to transform my df to a data.table : Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent I have to learn more about data.table cause I've seen a lot of topics using it, i was only used to dataframes. Thanks again for your time ! – Doc Martin's Jan 10 '19 at 10:37
  • @DocMartin's hard to say without sample data of your production data... Perhaps it is batter to ask a new question on SO, regarding your issue.. make sure to include sample data using `dput()`, so we know what you are working with.. – Wimpel Jan 10 '19 at 11:36
  • I come back to thank you again Wimpel, now its ok I understood what was wrong and could use your solution which works perfectly ! Kind regards, – Doc Martin's Feb 21 '19 at 14:47
0

Thanks for the answer, i found out that only the date was usefull for me since I had huge gaps in my dataset. With a simple ifelse, I found out something working :

merged.data$group<-ifelse(merged.data$date >= "2017-05-15" & merged.data$date <= "2017-05-29",1, ifelse(merged.data$date >= "2017-08-11" & merged.data$date <= "2017-11-29",2, ifelse(merged.data$date >= "2018-01-05" & merged.data$date <= "2018-02-16",3, NA )))

This doesn't work with the POSIXlt objects I had, but the solution provided by Wimpel seems to work (I have a problem using data.table, but thats another story !)

Thanks again, this forum is really a huge help !

Doc Martin's
  • 341
  • 1
  • 3
  • 11