Answering this question Temperature curve in R I came across a weird behavior of a dplyr::filter
- lubridate::minute
combination.
See the test data dta
below. dta$time
is a lubridate::hhmm
format.
library(lubridate)
library(dplyr)
dta$Time <- hm(dta$Time)
To get only rows with full hours (i.e. 0 minutes) one can subset using lubridate::minute
like this:
dta[minute(dta$Time) == 0,]
# Time Temp1 Temp2
# 1 0S 18.62800 18.54458
# 7 1H 0M 0S 18.45733 18.22625
# 13 2H 0M 0S 18.33258 18.04142
However, when using dplyr
's filter
, like this
dta %>% filter(minute(Time) == 0)
# Time Temp1 Temp2
# 1 0S 18.62800 18.54458
# 2 10M 0S 18.45733 18.22625
# 3 20M 0S 18.33258 18.04142
the result does not really fit the expectation. (UPDATE: The values of Temp1
and Temp2
are correct, only Time
is corrupt... Thanks to @Brian btw for giving this hint. )
Additionally this warning is returned:
Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs
This was also reported and somehow solved here, but only by coercion, which seems to remove the fun (and very readable) part of lubridate.
Question: Is there any way (to date) to dplyr::filter
lubridate::hhmm(ss)
formats without coercing it to character etc.?
Update:
It seems that the vector created by
minute(dta$Time)
# [1] 0 10 20 30 40 50 0 10 20 30 40 50 0
looks like a numeric vector, yet seems to have some mysterious characteristics.
Furthermore, as @Lyngbakr pointed out even the comparison with ==
does not have the usual characteristics as a "normal" logical vector.
tst <- minute(dta$Time) == 0
dta %>% filter(tst)
will result in the same strange Time
column.
Sample data:
dta <- read.table(text = " Time Temp1 Temp2
1 00:00 18.62800 18.54458
2 00:10 18.60025 18.48283
3 00:20 18.57250 18.36767
4 00:30 18.54667 18.36950
5 00:40 18.51483 18.36550
6 00:50 18.48325 18.34783
7 01:00 18.45733 18.22625
8 01:10 18.43767 18.19067
9 01:20 18.41583 18.22042
10 01:30 18.39608 18.21225
11 01:40 18.37625 18.18658
12 01:50 18.35633 18.05942
13 02:00 18.33258 18.04142", header = T)