0

I have a date column that contains two dates (e.g Start Date 1/1/13 to End Date 12/31/13")and some rows contain different format like (e.g.1/1/13 to 12/31/13), (October to May)

I want to unify the format to MM/DD/YYYY and separate the start and end date into two columns.

How can I get rid of the characters and separate the two dates and place them into two separate columns like the picture attached?

Can this be achieved in R?

Mr.Rlover
  • 2,523
  • 3
  • 14
  • 32

2 Answers2

1

You could use regular expressions with stringr and lubridate :

df <- data.frame(range = c("1/1/13 to 12/31/13",
                           "5/5/15 to 10/27/15"))

df$from <- lubridate::mdy(stringr::str_extract(df$range,"^.*?(?=to)"))  
df$to <- lubridate::mdy(stringr::str_extract(df$range,"(?=to).*?$")) 
df
#>                range       from         to
#> 1 1/1/13 to 12/31/13 2013-01-01 2013-12-31
#> 2 5/5/15 to 10/27/15 2015-05-05 2015-10-27

Created on 2020-09-20 by the reprex package (v0.3.0)

Or without converting to date :

library(dplyr)

df <- data.frame(range = c("1/1/13 to 12/31/13",
                           "5/5/15 to 10/27/15",
                           "October to November"))

df %>% mutate(from = stringr::str_extract(range,"^.*?(?= to)"),
              to = stringr::str_extract(range,"(?<=to ).*?$"))
#>                 range    from       to
#> 1  1/1/13 to 12/31/13  1/1/13 12/31/13
#> 2  5/5/15 to 10/27/15  5/5/15 10/27/15
#> 3 October to November October November

Created on 2020-09-20 by the reprex package (v0.3.0)

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • What if the column contains a character Date like this. "November to May", "May to September"? The above code works perfectly with the other Date format except for the one I mentioned above. – Abdulaziz Alshamsi Sep 19 '20 at 22:36
  • 1
    Difficult to create a date if you don't specify the year. – Waldi Sep 19 '20 at 22:39
  • Ok, how can we remove any characters in the date column and split the two dates into two separate columns with keeping the current format? (or example, my dataset contain the following: df <- data.frame(Date = c("1/1/13 to 12/31/13", "01/01/2015 to 12/31/2015", "Start Date 1/1/13 to End Date 12/31/13", "to 10/07/2017", "01/01/2016 to", "November to March")) how can I keep separate the dates into two columns? – Abdulaziz Alshamsi Sep 19 '20 at 23:27
  • Another possibility is not to transform as date and just to return a string, see my edit – Waldi Sep 20 '20 at 02:39
  • The problem with this approach is the word "to" didn't go away. You can clearly see it in the "to" column. – Abdulaziz Alshamsi Sep 20 '20 at 02:52
  • Thank you so much! Is there any way to use Pipe Operator instead? – Abdulaziz Alshamsi Sep 20 '20 at 04:14
  • Great!! Thank you again!! – Abdulaziz Alshamsi Sep 20 '20 at 22:07
1

You should be able to do it with just regular expressions and back referencing Backreference in R.

dat<-data.frame(datestring=c("November to March","11/1/2001 to 12/8/2001"))

dat$from <- gsub("(.*) to (.*)","\\1",dat$datestring) # creates a new column 'from' that takes the first thing (before the 'to') 
dat$to <- gsub("(.*) to (.*)","\\2",dat$datestring) # creates a new column 'to' that takes the second thing (after the 'to')

dat

              datestring      from        to
1      November to March  November     March
2 11/1/2001 to 12/8/2001 11/1/2001 12/8/2001
Dylan_Gomes
  • 2,066
  • 14
  • 29