I am working on a data frame that has a column that uses the date and time format as "mm/dd/yy HH:MM" for some observations and "yyyy/mm/dd HH:MM:SS" format for other observations, of course, this inconsistency results in errors or NA returns in my code, how can I unify the whole column so my calculations are not interrupted by this inconsistency?
-
Convert them into `date` format with `as.Date()` – Chamkrai Jul 09 '22 at 18:40
-
@TomHoel i am using this line but the results is a column with all NA values mutate(start_date_corrected=as.Date(started_at, format="%m/%d/%Y %H:%M:%S")) – admo7000 Jul 09 '22 at 18:46
-
Can you share a sample of your data by using `dput(your_data)` ? – shafee Jul 09 '22 at 18:47
-
@shafee it is available in the embedded pic click on (enter image description here) – admo7000 Jul 09 '22 at 18:57
-
Its always a good idea to provide a sample of the data, so that we can help you. Providing a screenshot of the data is not helpful. If you are not being helpful, its a bit difficult for us to help you. : ) – shafee Jul 09 '22 at 19:10
1 Answers
Update step by step:
We have a data frame with for columns each of them have character type columns (you can check this with
str(df)
In order to change the format from character to datetime in all four columns we use
mutate(across(1:4, ...
What we want is that in each column 1:4 the character type is changed to datetime
this can be done with the function
parse_date_time
from lubridate packageHere we use
~
to indicate an anonymous functionthe
.
indicates column 1-4.and most important the argument
c("ymd_HMS", "mdy_HM")
which gives the order of the different formats of the date columns!
We could use parse_date_time()
from lubridate
package. Important is the argument c("ymd_HMS", "mdy_HM")
. Here you define the occurence of the different formats:
and note to use HMS
, because:
hms, hm and ms usage is defunct, please use HMS, HM or MS instead. Deprecated in version '1.5.6'.
library(dplyr)
library(lubridate)
df %>%
mutate(across(1:4, ~parse_date_time(., c("ymd_HMS", "mdy_HM"))))
started_at ended_at started_at_1 ended_at_1
<dttm> <dttm> <dttm> <dttm>
1 2021-10-29 17:42:36 2021-10-29 18:00:23 2021-06-13 11:40:00 2021-06-13 12:02:00
2 2021-10-01 15:06:10 2021-10-01 15:09:23 2021-06-27 16:26:00 2021-06-27 16:39:00
3 2021-10-28 23:02:53 2021-10-28 23:07:11 2021-06-10 20:06:00 2021-06-10 20:28:00
4 2021-10-17 00:58:17 2021-10-17 01:02:08 2021-06-11 15:54:00 2021-06-11 16:11:00
5 2021-10-27 18:29:34 2021-10-27 18:34:48 2021-06-05 14:09:00 2021-06-05 14:42:00
6 2021-10-17 13:30:21 2021-10-17 13:35:26 2021-06-05 14:14:00 2021-06-05 14:37:00
7 2021-10-04 19:59:28 2021-10-04 21:06:24 2021-06-16 19:05:00 2021-06-16 19:16:00
8 2021-10-10 00:27:09 2021-10-10 00:39:58 2021-06-23 20:29:00 2021-06-23 20:43:00
data:
structure(list(started_at = c("2021-10-29 17:42:36", "2021-10-01 15:06:10",
"2021-10-28 23:02:53", "2021-10-17 00:58:17", "2021-10-27 18:29:34",
"2021-10-17 13:30:21", "2021-10-04 19:59:28", "2021-10-10 00:27:09"
), ended_at = c("2021-10-29 18:00:23", "2021-10-01 15:09:23",
"2021-10-28 23:07:11", "2021-10-17 01:02:08", "2021-10-27 18:34:48",
"2021-10-17 13:35:26", "2021-10-04 21:06:24", "2021-10-10 00:39:58"
), started_at_1 = c("6/13/21 11:40", "6/27/21 16:26", "6/10/21 20:06",
"6/11/21 15:54", "6/5/21 14:09", "6/5/21 14:14", "6/16/21 19:05",
"6/23/21 20:29"), ended_at_1 = c("6/13/21 12:02", "6/27/21 16:39",
"6/10/21 20:28", "6/11/21 16:11", "6/5/21 14:42", "6/5/21 14:37",
"6/16/21 19:16", "6/23/21 20:43")), class = "data.frame", row.names = c(NA,
-8L))

- 72,363
- 6
- 19
- 66
-
-
A bit irrelevant to the question, Can you guide how did you manage to read that data from that image? @TarJae – shafee Jul 09 '22 at 19:01
-
-
-
@TarJae This works now and i have successfully unified my format across the column – admo7000 Jul 09 '22 at 22:18