0

I have data with dates in MM/DD/YY HH:MM format and others in plain old MM/DD/YY format. I want to parse all of them into the same format as "2010-12-01 12:12 EST." How should I go about doing that? I tried the following ifelse statement and it gave me a bunch of long integers and told me a large number of my data points failed to parse:

df_prime$date <- ifelse(!is.na(mdy_hm(df$date)), mdy_hm(df$date), mdy(df$date))

df_prime is a duplicate of the data frame df that I initially loaded in

  IEN          date admission_number KEY_PTF_45       admission_from                        discharge_to
1  12  3/3/07 18:05                1     252186         OTHER DIRECT                                
2  12  3/9/07 12:10                1     252186                      RETURN TO COMMUNITY-    INDEPENDENT
3  12 3/10/07 15:08                2     252382 OUTPATIENT TREATMENT                                
4  12 3/14/07 10:26                2     252382                      RETURN TO COMMUNITY-INDEPENDENT
5  12 4/24/07 19:45                3     254343         OTHER DIRECT                                
6  12 4/28/07 11:45                3     254343                      RETURN TO COMMUNITY-INDEPENDENT
...
1046334 23613488506       2/25/14               NA         NA                            
1046335 23613488506 2/25/14 11:27               NA         NA                            
1046336 23613488506       2/28/14               NA         NA                            
1046337 23613488506        3/4/14               NA         NA                            
1046338 23613488506 3/10/14 11:30               NA         NA                            
1046339 23613488506 3/10/14 12:32               NA         NA        

Sorry if some of the formatting isn't right, but the date column is the most important one.

EDIT: Below is some code for a portion of my data frame via a dput command:

structure(list(IEN = c(23613488506, 23613488506, 23613488506, 23613488506, 23613488506, 23613488506), date = c("2/25/14", "2/25/14 11:27", "2/28/14", "3/4/14", "3/10/14 11:30", "3/10/14 12:32")), .Names = c("IEN", "date"), row.names = 1046334:1046339, class = "data.frame") 
Brandon Sherman
  • 673
  • 1
  • 8
  • 25
  • So your date column is *character* and contains other formats than those you've shown in your extract? Any chance of you creating something with examples of all the formats and in a `dput` format? – Spacedman Aug 12 '14 at 15:55
  • What is a `dput` format? And I'll update my original post with an example of the second format. Sorry about that! – Brandon Sherman Aug 12 '14 at 15:56
  • `` Please make your question reproducible. Two vectors with all possible styles will do. Easily copied into R preferably. `` – Roman Luštrik Aug 12 '14 at 15:58
  • Make a little data frame with maybe only 2 columns (we don't care about most of the other stuff) and then, where `d` is your data frame, do `dput(d)` - paste that and then we can just cut and paste it into our sessions to reconstruct exactly your data frame. – Spacedman Aug 12 '14 at 15:58
  • @Spacedman Please see my original post. Thanks again! – Brandon Sherman Aug 12 '14 at 16:04

2 Answers2

1

Have you tried the function guess_formats() in the lubridate package? A reproducible example to build a dataframe like yours could be helpful!

Fabio
  • 518
  • 1
  • 4
  • 10
0

The lubridate package's mdy_hm has a truncated parameter that lets you supply dates that might not have all the bits. For your example:

> mdy_hm(d$date,truncated=2)
[1] "2014-02-25 00:00:00 UTC" "2014-02-25 11:27:00 UTC"
[3] "2014-02-28 00:00:00 UTC" "2014-03-04 00:00:00 UTC"
[5] "2014-03-10 11:30:00 UTC" "2014-03-10 12:32:00 UTC"
Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • I think that worked. Thanks! I tried that before and it didn't work, and I suspect it's because I put `truncate` as the parameter instead of `truncated.` Is there any way to keep the times for those that have it though? – Brandon Sherman Aug 12 '14 at 16:20
  • It **is** keeping the times for those that have it... It just sets the time for those that don't to 00:00:00. – Spacedman Aug 12 '14 at 16:26
  • And yes, you need to spell out `truncated` because the function has a `...` argument first which catches anything that doesn't match the exact argument. When a function doesn't have dots (or dots are last) you can abbreviate arguments. Such is R. – Spacedman Aug 12 '14 at 16:44