2

I have a data frame of two columns "start" and "end" in HH:MM:SS format.

I wanted to calculate duration between start and end using difftime function

It always gives back this error: Error in as.POSIXct.numeric(time1) : 'origin' must be supplied

I read many posts but none seemed to work for me.

Loading packages

library(dplyr)
library(tidyverse)
library(lubridate)

I removed hours to deal with minutes and seconds only

get_time <- function(x){str_sub(x, start = -5) %>%  ms()} 
df <- df %>% mutate(start = get_time(start)) %>%
  mutate(end = get_time(end))

Class of objects

class(df$start)
gives: 
[1] "Period"
attr(,"package")
[1] "lubridate"
start                 end       

26M 22S               26M 23S        
26M 25S               26M 37S      
29M 47S               30M 13S

I calculated duration using difftime function

df$duration <- with(df, difftime(end, start, units="secs"))
gives error:
Error in as.POSIXct.numeric(time1) : 'origin' must be supplied

I used subtraction operator, it worked fine except for 3rd row when minutes are different, it gave wrong answer.

start                 end            duration

26M 22S               26M 23S        1S
26M 25S               26M 37S        12S
29M 47S               30M 13S        1M -34S

Amendment

The accepted response works perfectly fine, except that it returns an error: Error in mtx1[3, ] : incorrect number of dimensions whenever applied to the second two columns "start2" and "end2" that I have in the same data frame.

sample from my df

df <- structure(list(item = c("manatee", "manatee", "pile", "pile"), prestart = new("Period", .Data = c(22, 
25, 41, 49), year = c(0, 0, 0, 0), month = c(0, 
0, 0, 0), day = c(0, 0, 0, 0), hour = c(0, 0, 0, 
0), minute = c(26, 26, 26, 26)), preend = new("Period", 
    .Data = c(23, 37, 48, 50), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0
    ), hour = c(0, 0, 0, 0), minute = c(26, 26, 26, 26)), poststart = new("Period", .Data = c(23, 41, 50, 
54), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0), hour = c(0, 0, 0, 0), 
    minute = c(26, 26, 26, 26)), postend = new("Period", 
    .Data = c(37, 48, 52, 22), year = c(0, 0, 0, 0), month = c(0, 0, 0, 0), day = c(0, 0, 0, 0
    ), hour = c(0, 0, 0, 0), minute = c(26, 26, 26, 27))), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))

Organising data in minutes and seconds only (remove hours)


get_time <- function(x){str_sub(x, start = -5) %>%  ms()} 
df <- df %>% mutate(prestart = get_time(prestart)) %>%
  mutate(preend = get_time(preend)) %>% 
  mutate(poststart = get_time(poststart)) %>% 
  mutate(postend = get_time(postend))


Acer acer
  • 23
  • 4

1 Answers1

0

Update 2: I'm keeping both previous answers for the record (in case somebody really does have data that contains strings like this). However, the data is actually derived from lubridate, so "26M 22S" is merely a representation of a numeric object.

Ultimately, it is as direct as:

lubridate::as.difftime(df$preend - df$prestart, units="secs")
# Time differences in secs
# [1]  1 12  7  1

Update: your data format is nothing like I originally inferred. I'll keep the original answer below, but given this data structure it is not much help.

You can always try to do "modulus subtraction", but I think the best way to go is to convert to decimal and back. First, I'll provide data in two ways that make it incredibly easier for users to know exactly what your data looks like. (Having this up-front would have precluded me providing the original less-helpful answer.) Please use something like this in the future, it means a lot!

x <- data.frame(
  start = c("26M 22S", "26M 25S", "29M 47S"),
  end = c("26M 23S", "26M 37S", "30M 13S"),
  stringsAsFactors = FALSE
)

# if you don't want to generate a frame like that, then you can
# provide the output from dput(head(x))
structure(list(start = c("26M 22S", "26M 25S", "29M 47S"), end = c("26M 23S", 
"26M 37S", "30M 13S")), class = "data.frame", row.names = c(NA, 
-3L))

From here, two helper functions to convert to/from decimal minutes. These both make the assumption that you only ever deal with minutes/seconds, never more. Similarly, conversion back to character assumes that you are always using integral seconds, which is perhaps hasty. If this is not the case, you can remove the round and accept fractional components, perhaps using sprintf("%dM %02.3f", ...) instead, controlling the decimal component.

decimal_minutes <- function(s) {
  nums <- strsplit(gsub("[^0-9 ]", "", s), "\\s+")
  mtx <- sapply(nums, as.integer)
  mtx[1,] + mtx[2,] / 60
}
minutes_seconds <- function(num, keep0 = TRUE) {
  out <- sprintf("%dM %02dS", as.integer(num), as.integer(round(60 * (num %% 1), 0)))
  if (!keep0) out <- gsub("^0M ", "", out)
  out
}

From here, you can always preserve the numeric version if you want to use them elsewhere:

x[,c("startnum", "endnum")] <- lapply(x[,c("start", "end")], decimal_minutes)
x
#     start     end startnum   endnum
# 1 26M 22S 26M 23S 26.36667 26.38333
# 2 26M 25S 26M 37S 26.41667 26.61667
# 3 29M 47S 30M 13S 29.78333 30.21667
x$endnum - x$startnum
# [1] 0.01666667 0.20000000 0.43333333
minutes_seconds(x$endnum - x$startnum)
# [1] "0M 01S" "0M 12S" "0M 26S"
minutes_seconds(x$endnum - x$startnum, keep0 = FALSE)
# [1] "01S" "12S" "26S"

But if all you want is the one-time subtraction, you can wrap it up in one call:

x$duration <- minutes_seconds(
  decimal_minutes(x$end) - decimal_minutes(x$start),
  keep0 = TRUE
)
x
#     start     end duration
# 1 26M 22S 26M 23S   0M 01S
# 2 26M 25S 26M 37S   0M 12S
# 3 29M 47S 30M 13S   0M 26S
x$duration <- minutes_seconds(
  decimal_minutes(x$end) - decimal_minutes(x$start),
  keep0 = FALSE
)
x
#     start     end duration
# 1 26M 22S 26M 23S      01S
# 2 26M 25S 26M 37S      12S
# 3 29M 47S 30M 13S      26S

Ideally, this can and should be generalized to accept more (such as hours, as in "1H 23M 11S"). An easy step would be to update decimal_minutes to look for and deal with longer formats. I wonder if lubridate would be a good fit for you, though I doubt it'll take "26M 22S" as a native format, so you'll still need to do some data mangling to start using it.


Discussions about origin= with regards to R's POSIXt means that it is likely converting from a number to a time/date. A common reason to do this is when using epoch seconds (common in unix-y stuff) as a numeric depiction of a timestamp. It's common to assume that the "0 seconds" of this format (1970-01-01 00:00:00) is absolute, but it is not universal (excel is different), and it is feasible and even realistic/desirable to have a different "0" time. So it assumes nothing, forcing you to be explicit.

as.POSIXct(100, origin="1970-01-01 00:00:00")
# [1] "1969-12-31 16:01:40 PST"
as.POSIXct(100, origin="1970-01-01 00:00:00", tz="UTC")
# [1] "1970-01-01 00:01:40 UTC"
### or even just 
as.POSIXct(100, origin="1970-01-01")

So to use difftime on numbers, you first need to convert those numbers with something like as.POSIXct(..., origin="1970-01-01") before doing difftime.

However, since you want seconds, and numeric epoch is already in seconds, you could just do

end - start

If you really need it to be tagged as "seconds", then do

`units<-`(end - start, "secs")
### such as
`units<-`(100-90, "secs")
# [1] 10
# attr(,"units")
# [1] "secs"
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thank you, unfortunately as.POSIXct(..., origin="1970-01-01") did not work. However, (end- start) works , except for instances where the minute is not the same for the two times , (30M 13S) - (29M 47S) gave (1M -34S) which is wrong it should be= 26S – Acer acer Feb 25 '19 at 11:23
  • Yeah, from your edit I see a LOT of things are not as I initially interpreted them. – r2evans Feb 25 '19 at 16:51
  • Thank you very much. Your answer perfectly worked for me :) – Acer acer Feb 26 '19 at 13:48
  • However, I have another two columns next to these 2, to which I should do the same calculation, whenever I do it to the next 2 columns, this error occurs : Error in mtx[1, ] : incorrect number of dimensions ??? – Acer acer Feb 26 '19 at 14:09
  • Ok, I just amended the question with a sample of my data, please have a look. In actual data, "start" and "end" have 26 rows, while "start2" and "end2" have 17 rows. – Acer acer Feb 26 '19 at 15:35
  • The data you posted is incorrect, it's missing a couple commas. However, when I fix that, it works fine on my console with the provided functions. Perhaps the data is not exactly as you appended to your question? – r2evans Feb 26 '19 at 17:59
  • my actual data is imported from an excel file, please find above a sample generated using dput(head()). – Acer acer Feb 27 '19 at 14:43
  • I am still getting the error of incorrect number of dimensions :/ ? – Acer acer Feb 27 '19 at 15:14
  • If you look at `str(df)`, you should see right away that the use of `lubridate::Period` is producing numbers, and `"26M 22S"` is just a representation of it. Given this, if you just do `str(df$preend - df$prestart)`, you'll see that there are actual numbers in there. I think you need to understand that there is a distinct difference between what an object ***is*** and how it is ***displayed*** on the screen. In this case, all of the `"M"` and `"S"` are artifacts of printing them in a way that `lubridate` thinks are more human readable. `as.numeric(df$preend - df$prestart)` gives you numbers. – r2evans Feb 27 '19 at 16:51
  • It appears that your questions are due to a misunderstanding of "is-vs-display", and how `lubridate` works. I'm glad we finally got to the unambiguous format of the data (using `dput`, please understand its value!), that would have answered your question much sooner and more accurately. – r2evans Feb 27 '19 at 16:53
  • 1
    It worked now, yes I realised how (dput) is important. thanks a lot ! – Acer acer Feb 28 '19 at 11:24