-1

I am trying to convert a column in R that is a factor to a date time. When I use lubridate, my values are changed to POSIXCt, but the times are dropped. Is there a solution that I am not seeing?

Import Data:

transaction_march_raw <- read.csv(file = "myfile.csv")

transaction_march <- data.frame(transaction_march_raw, stringsAsFactors =  FALSE)

Clean transactions:

transaction_march <- transaction_march_raw %>% 
    select(ACT_TRANS_DATE) %>%
    clean_names()
 
str(transaction_march)

'data.frame':   373143 obs. of  1 variable:
 $ act_trans_date: Factor w/ 38543 levels "2/1/20 0:00",..: 1 1 1 1 1 1 1 1 1 1 ...

 
head(transaction_march)
  act_trans_date
1    2/1/20 0:00
2    2/1/20 0:00
3    2/1/20 0:00
4    2/1/20 0:00
5    2/1/20 0:00
6    2/1/20 0:00



transaction_march$act_trans_date <- mdy_hm(transaction_march$act_trans_date)



str(transaction_march)
'data.frame':   373143 obs. of  1 variable:
 $ act_trans_date: POSIXct, format: "2020-02-01" "2020-02-01" "2020-02-01" "2020-02-01" ...

head(transaction_march)
  act_trans_date
1     2020-02-01
2     2020-02-01
3     2020-02-01
4     2020-02-01
5     2020-02-01
6     2020-02-01
Phil
  • 7,287
  • 3
  • 36
  • 66

1 Answers1

0

Don't worry, your times are not dropped. The string is converted to POSIXct, but for printing, midnight is omitted and UTC timezone is omitted as well. See documentation:

The default for the format methods is "%Y-%m-%d %H:%M:%S" if any element has a time component which is not midnight, and "%Y-%m-%d" otherwise.

So for midnight times you see only a date, however, for calculations it's still complete date and time.

If you would have more rows printed with various hours, you would see that your data is really complete!

See this code:

library(lubridate)

# your data
data <- structure(
  list(act_trans_date = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
  .Label = "2/1/20 0:00", class = "factor")),
  row.names = c("1", "2", "3", "4", "5", "6"),
  class = "data.frame")

# check that data frame
str(data)
#> 'data.frame':    6 obs. of  1 variable:
#>  $ act_trans_date: Factor w/ 1 level "2/1/20 0:00": 1 1 1 1 1 1

# convert to date
data$act_trans_date <- mdy_hm(data$act_trans_date)

# print again
str(data)
#> 'data.frame':    6 obs. of  1 variable:
#>  $ act_trans_date: POSIXct, format: "2020-02-01" "2020-02-01" ...

# check some real value
data[1, 1]
#> [1] "2020-02-01 UTC"

# and see this
mdy_hm("2/1/20 0:00")
#> [1] "2020-02-01 UTC"
mdy_hm("2/1/20 1:00")
#> [1] "2020-02-01 01:00:00 UTC"

Created on 2020-07-01 by the reprex package (v0.3.0)

  • Wow. You are correct. My first 12,000 values are at midnight. I should have printed out the tail. I really appreciate the help. I spent way too long trying to figure this out. – Ean Johnson Jul 02 '20 at 12:57
  • @EanJohnson Great, thanks for your feedback, I'm glad it really works for you! Sometimes what is printed out differs from what is really under the hood, so it can be quite tricky to figure it out. :-) –  Jul 02 '20 at 17:13