-1

I have the following dataframe (ts1):

                D1 Diff
1 20/11/2014 16:00 0.00
2 20/11/2014 17:00 0.01
3 20/11/2014 19:00 0.03

I would like to add a new column to ts1 that will be the difference in hours between successive rows D1 (dates) in hours.

The new ts1 should be:

                D1 Diff N
1 20/11/2014 16:00 0.00 
2 20/11/2014 17:00 0.01 1
3 20/11/2014 19:00 0.03 2

For calculating the difference in hours independently I use:

library(lubridate)
difftime(dmy_hm("29/12/2014 11:00"), dmy_hm("29/12/2014 9:00"), units="hours") 

I know that for calculating the difference between each row I need to transform the ts1 into matrix.

I use the following command:

> ts1$N<-difftime(dmy_hm(as.matrix(ts1$D1)), units="hours")

And I get:

Error in as.POSIXct(time2) : argument "time2" is missing, with no default
Avi
  • 2,247
  • 4
  • 30
  • 52
  • 1
    Use `diff`: `c(NA,diff(dmy_hm(df$D1)))`. – nicola Jan 10 '16 at 13:10
  • Thank @nicola, When I use the command I get: > diff: c(NA,diff(dmy_hm(ts1$D1))) Error in diff:c(NA, diff(dmy_hm(ts1$D1))) : NA/NaN argument In addition: Warning message: In diff:c(NA, diff(dmy_hm(ts1$D1))) : numerical expression has 2287 elements: only the first used – Avi Jan 10 '16 at 13:16
  • @nicola it has its perks as `diff` will set its own units default. I would do something like `res <- diff(as.POSIXct(df$D1, format = "%d/%m/%Y %H:%M")) ; units(res) <- "hours"` – David Arenburg Jan 10 '16 at 13:16
  • Thanks David, The calculation looks good for res. And if I use @nicola inner part - diff(dmy_hm(df$D1)) I get same results as well. How can I add them to my source dataframe as an additional column? – Avi Jan 10 '16 at 13:20
  • 2
    I guess something like `df$N <- c(NA_real_, as.numeric(res))` should do, no? – David Arenburg Jan 10 '16 at 13:23
  • Thanks David, works great! – Avi Jan 10 '16 at 13:26

2 Answers2

2

Suppose ts1 is as shown in Note 2 at the end. Then create a POSIXct variable tt from D1, convert tt to numeric giving the number of seconds since the Epoch, divide that by 3600 to get the number of hours since the Epoch and take differences. No packages are used.

tt <- as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")
m <- transform(ts1, N = c(NA, diff(as.numeric(tt) / 3600)))

giving:

> m

                D1 Diff  N
1 20/11/2014 16:00 0.00 NA
2 20/11/2014 17:00 0.01  1
3 20/11/2014 19:00 0.03  2

Note 1: I assume you are looking for N so that you can fill in the empty hours. In that case you don't really need N. Also, it would be easier to deal with time series if you use a time series representation. First we convert ts1 to a zoo object, then we create a zero width zoo object with the datetimes that we need and finally we merge them:

library(zoo)
z <- read.zoo(ts1, tz = "", format = "%d/%m/%Y %H:%M")

z0 <- zoo(, seq(start(z), end(z), "hours"))
zz <- merge(z, z0)

giving:

> zz
2014-11-20 16:00:00 2014-11-20 17:00:00 2014-11-20 18:00:00 2014-11-20 19:00:00 
               0.00                0.01                  NA                0.03 

If you really did need a data frame back then:

DF <- fortify.zoo(zz)

Note 2: Input used in reproducible form is:

Lines <- "D1,Diff
1,20/11/2014 16:00,0.00
2,20/11/2014 17:00,0.01
3,20/11/2014 19:00,0.03"

ts1 <- read.csv(text = Lines, as.is = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Thanks to @David Arenburg and @nicola: Can use either:

res <- diff(as.POSIXct(ts1$D1, format = "%d/%m/%Y %H:%M")) ; units(res) <- "hours" 

Or:

res <- diff(dmy_hm(ts1$D1))

and afterwards:

ts1$N <- c(NA_real_, as.numeric(res))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Avi
  • 2,247
  • 4
  • 30
  • 52
  • 1
    In both options, if you want to make sure you'll get hours differences, you will have to make sure `units` are correct, as `diff` will set the `units` to the *the largest possible (excluding "weeks") in which all the absolute differences are greater than one*. – David Arenburg Jan 10 '16 at 13:35