1

I have a character vector of strings my_strings where some elements include a single date in YYYYMMDD format. I'd like to replace the YYYYMMDD dates with MMM YYYY dates. For example,

my_strings <- c('apple','2000 20150101 bar', '20160228')

would become c('apple', '2000 Jan 2015 bar', 'Feb 2016'). What's the best way to do this in R (esp., stringr)?

I thought the following would work:

library(stringr)
pattern <- '([0-9]{4})([0-9]{2})[0-9]{2}'
str_replace(my_strings, pattern, str_c(month.abb[as.integer("\\2")], " \\1"))

But I guess I can't do anything with the captured items? I did find that this works:

library(stringr)
library(dplyr)
library(lubridate)
pattern <- '[0-9]{8}'
my_strings %>%
  str_match(pattern) %>%
  ymd() %>% 
  format('%b %Y') %>% 
  str_replace_na() ->
  replacement_vals
str_replace(my_strings, pattern, replacement_vals)

But this seems clunky. There has to be a simpler approach here, right? Something like my first attempt?

lowndrul
  • 3,715
  • 7
  • 36
  • 54

2 Answers2

4

We can do this with gsubfn

library(gsubfn)
gsubfn("([0-9]{8})", ~format(as.Date(x, "%Y%m%d"), "%b %Y"), my_strings)
#[1] "apple"             "2000 Jan 2015 bar" "Feb 2016" 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

base R solution:

my_strings <- c('apple','2000 20150101 bar', '20160228')

unlist( lapply(strsplit(my_strings, '\ '), function( x ) {
  b1 <- format(as.Date(x, "%Y%m%d"), "%b %Y")
  x[which(!is.na(b1) )] <- na.omit( b1 )
  paste( x, collapse = '  ' )
})
)

# [1] "apple"               "2000  Jan 2015  bar" "Feb 2016"    
Sathish
  • 12,453
  • 3
  • 41
  • 59