I have a dataframe column containing strings made up of several "chunks" divided by separators, as in: XXX-XXX-XXX-XXX-XXX-XXX. I want to make a new column that contains the first N chunks, i.e. XXX-XXX-XXX-XXX for N = 4.
I can do this using tidyr::separate()
:
df %>% separate(col1, into = c('tmp1', 'tmp2', 'tmp3', 'tmp4'), sep = '-', remove = F) %>%
mutate(col2 = paste(tmp1, tmp2, tmp2, sep = '-')
But I'd like a more direct way that avoids making the temp columns. I have a partial solution using stringr::str_split
:
df %>% mutate(col2 = paste(str_split(col1, '-')[[1]][1:4], collapse = '-'))
This works for a single string, but it doesn't vectorise properly—when applied to a dataframe, every line has the same col2. This comes from the [[1]]
, because if I change it to [[2]]
then every line of col2 corresponds to row #2 of col1. I tried using [[row_number()]]
, but that gave me the error "recursive indexing failed at level 2"
. Does anyone know how to vectorise this along a dataframe, ideally with mutate()
rather than apply()
?