0

I am working on a data frame that has a column that uses the date and time format as "mm/dd/yy HH:MM" for some observations and "yyyy/mm/dd HH:MM:SS" format for other observations, of course, this inconsistency results in errors or NA returns in my code, how can I unify the whole column so my calculations are not interrupted by this inconsistency?

enter image description here

Quinten
  • 35,235
  • 5
  • 20
  • 53
admo7000
  • 13
  • 3
  • Convert them into `date` format with `as.Date()` – Chamkrai Jul 09 '22 at 18:40
  • @TomHoel i am using this line but the results is a column with all NA values mutate(start_date_corrected=as.Date(started_at, format="%m/%d/%Y %H:%M:%S")) – admo7000 Jul 09 '22 at 18:46
  • Can you share a sample of your data by using `dput(your_data)` ? – shafee Jul 09 '22 at 18:47
  • @shafee it is available in the embedded pic click on (enter image description here) – admo7000 Jul 09 '22 at 18:57
  • Its always a good idea to provide a sample of the data, so that we can help you. Providing a screenshot of the data is not helpful. If you are not being helpful, its a bit difficult for us to help you. : ) – shafee Jul 09 '22 at 19:10

1 Answers1

2

Update step by step:

  1. We have a data frame with for columns each of them have character type columns (you can check this with str(df)

  2. In order to change the format from character to datetime in all four columns we use mutate(across(1:4, ...

  3. What we want is that in each column 1:4 the character type is changed to datetime

  4. this can be done with the function parse_date_time from lubridate package

  5. Here we use ~ to indicate an anonymous function

  6. the . indicates column 1-4.

  7. and most important the argument c("ymd_HMS", "mdy_HM") which gives the order of the different formats of the date columns!

We could use parse_date_time() from lubridate package. Important is the argument c("ymd_HMS", "mdy_HM"). Here you define the occurence of the different formats:

and note to use HMS , because:

hms, hm and ms usage is defunct, please use HMS, HM or MS instead. Deprecated in version '1.5.6'.
library(dplyr)
library(lubridate)

df %>% 
  mutate(across(1:4, ~parse_date_time(., c("ymd_HMS", "mdy_HM"))))
  started_at          ended_at            started_at_1        ended_at_1         
  <dttm>              <dttm>              <dttm>              <dttm>             
1 2021-10-29 17:42:36 2021-10-29 18:00:23 2021-06-13 11:40:00 2021-06-13 12:02:00
2 2021-10-01 15:06:10 2021-10-01 15:09:23 2021-06-27 16:26:00 2021-06-27 16:39:00
3 2021-10-28 23:02:53 2021-10-28 23:07:11 2021-06-10 20:06:00 2021-06-10 20:28:00
4 2021-10-17 00:58:17 2021-10-17 01:02:08 2021-06-11 15:54:00 2021-06-11 16:11:00
5 2021-10-27 18:29:34 2021-10-27 18:34:48 2021-06-05 14:09:00 2021-06-05 14:42:00
6 2021-10-17 13:30:21 2021-10-17 13:35:26 2021-06-05 14:14:00 2021-06-05 14:37:00
7 2021-10-04 19:59:28 2021-10-04 21:06:24 2021-06-16 19:05:00 2021-06-16 19:16:00
8 2021-10-10 00:27:09 2021-10-10 00:39:58 2021-06-23 20:29:00 2021-06-23 20:43:00

data:

structure(list(started_at = c("2021-10-29 17:42:36", "2021-10-01 15:06:10", 
"2021-10-28 23:02:53", "2021-10-17 00:58:17", "2021-10-27 18:29:34", 
"2021-10-17 13:30:21", "2021-10-04 19:59:28", "2021-10-10 00:27:09"
), ended_at = c("2021-10-29 18:00:23", "2021-10-01 15:09:23", 
"2021-10-28 23:07:11", "2021-10-17 01:02:08", "2021-10-27 18:34:48", 
"2021-10-17 13:35:26", "2021-10-04 21:06:24", "2021-10-10 00:39:58"
), started_at_1 = c("6/13/21 11:40", "6/27/21 16:26", "6/10/21 20:06", 
"6/11/21 15:54", "6/5/21  14:09", "6/5/21  14:14", "6/16/21 19:05", 
"6/23/21 20:29"), ended_at_1 = c("6/13/21 12:02", "6/27/21 16:39", 
"6/10/21 20:28", "6/11/21 16:11", "6/5/21  14:42", "6/5/21  14:37", 
"6/16/21 19:16", "6/23/21 20:43")), class = "data.frame", row.names = c(NA, 
-8L))
TarJae
  • 72,363
  • 6
  • 19
  • 66