drop_na( ) cannot work on POSIX-lt object

Question

According to the title, I make a simple example to test drop_na {tidyr} :

library(tidyr)
library(dplyr)

# (1.) produce a dataset with two POSIX type "ct" and "lt"

data <- data.frame(n = 1:5)
data$ct <- as.POSIXct(Sys.time() + rnorm(5) * 1000)
data$lt <- as.POSIXlt(Sys.time() + rnorm(5) * 1000)
str(data)

# $ n : int  1 2 3 4 5
# $ ct: POSIXct, format: "2018-10-07 03:02:28" ...
# $ lt: POSIXlt, format: "2018-10-07 02:37:26" ...


# (2.) assign the third values of "ct" and "lt" to NA

data[3, c("ct", "lt")] <- NA


# (3.) use different function to remove rows with NA

data %>% is.na()               # identify NAs in both "ct" and "lt"
data %>% drop_na('ct')         # drop NA from "ct"
data %>% drop_na('lt')         # NOT drop NA from "lt"
data[c(1, 2)] %>% na.omit()    # drop NA from "ct"
data[c(1, 3)] %>% na.omit()    # NOT drop NA from "lt"

From the conclusion above, if there are NAs in the POSIX-lt variables, only is.na() can be used to drop rows with NAs.

I approximately know the difference between POSIX "ct" and "lt".

POSIXct represents the number of seconds since the beginning of 1970 as a numeric vector.
POSIXlt is a named list of vectors representing.

So someone can explain why POSIXlt's missing values cannot be identified by drop_na() and na.omit() ?

See `?na.omit`: "At present [`na.omit`] will handle [...] data frames comprising vectors and matrices (only)." Thus, list columns, such as a `POSIXlt` column, will not be handled. — Henrik, Oct 07 '18 at 09:31

lebatsnok · Accepted Answer · 2018-10-06T22:09:37.533

Short answer: use POSIXct unless you really need POSIXlt

Longer answer:

POSIXlt is a difficult and capricious data structure. See:

> str(c(as.POSIXlt(Sys.time()), NA))
 POSIXlt[1:2], format: "2018-10-07 00:43:06" NA
> unclass(c(as.POSIXlt(Sys.time()), NA))
$sec
[1] 15.78872       NA

$min
[1] 43 NA

$hour
[1]  0 NA
# skipped a few rows

$isdst
[1]  1 -1

$zone
[1] "EEST" ""   
# skipped a few rows

In short, POSIXlt is a list of vectors, each vector representing one of the date/time units: seconds, minutes, hours, days, etc., but also time zone etc. There is no method for na.omit for POSIXlt, so na.omit.default is used, which does not know the specifics of POSIXlt class and treats it as an ordinary list.

> na.omit(list(NA,NA,NA))
[[1]]
[1] NA

[[2]]
[1] NA

[[3]]
[1] NA

If you need a na.omit method for POSIXlt, you can write one. But if not really, it is easier to use POSIXct.

A corollary: na.omit doesn't really work with lists either (i.e., it can be used but does nothing). You can sapply or lapply na.omit to the lists but that will produce strange results as well (NA components will be replaced by logical(0)). It looks like na.omit is meant for use with atomic vectors or factors, as well as data frames. (The help page says, it's mostly useful with data frames). Which means that na.omit is not intended to be useful with lists, including POSIXlt.

Finally, why would one use POSIXlt at all? The idea (as far as i understand it) is that you can easily manipulate the date's components - but even that can produce unexpected results:

> foo <- as.POSIXlt(Sys.time())
> foo
[1] "2018-10-07 01:06:22 EEST"
> foo$year
[1] 118
> foo$mon
[1] 9
> foo$mon <- 10
> foo
[1] "2018-11-07 01:06:22 EEST"
> foo$year <- 2018
> foo
[1] "3918-11-07 01:06:22 EEST"

So if you need to manipulate a date's components separately, you will have less surprises with lubridate.

> library(lubridate)
> year(foo)
[1] 3918
> year(foo) <- 2018
> foo
[1] "2018-11-07 01:06:22 EET"
> month(foo)
[1] 11
> month(foo)<-10
> foo
[1] "2018-10-07 01:06:22 EEST"

At the part you accounted for manipulating the date's components, I think the output is not unexpected. `foo$year` represents the number of year from 1900 and `foo$mon` stores 1~12 with 0~11. So that you assigned `foo$year` to 2018 actually means (1900+2018). — Darren Tsai, Oct 07 '18 at 03:47

drop_na( ) cannot work on POSIX-lt object

1 Answers1