Extract the first n chunks of a string within a data frame column

Question

I have a dataframe column containing strings made up of several "chunks" divided by separators, as in: XXX-XXX-XXX-XXX-XXX-XXX. I want to make a new column that contains the first N chunks, i.e. XXX-XXX-XXX-XXX for N = 4.

I can do this using tidyr::separate():

df %>% separate(col1, into = c('tmp1', 'tmp2', 'tmp3', 'tmp4'), sep = '-', remove = F) %>%
       mutate(col2 = paste(tmp1, tmp2, tmp2, sep = '-')

But I'd like a more direct way that avoids making the temp columns. I have a partial solution using stringr::str_split:

df %>% mutate(col2 = paste(str_split(col1, '-')[[1]][1:4], collapse = '-'))

This works for a single string, but it doesn't vectorise properly—when applied to a dataframe, every line has the same col2. This comes from the [[1]], because if I change it to [[2]] then every line of col2 corresponds to row #2 of col1. I tried using [[row_number()]], but that gave me the error "recursive indexing failed at level 2". Does anyone know how to vectorise this along a dataframe, ideally with mutate() rather than apply()?

score 0 · Answer 1 · answered Mar 08 '23 at 17:30

0

Typing this all out helped me figure out the answer, so in case it's helpful to anyone...Just add rowwise() before the mutate() and it works perfectly.

df %>% 
rowwise() %>%
mutate(col2 = paste(str_split(col1, '-')[[1]][1:4], collapse = '-'))

answered Mar 08 '23 at 17:30

Elle

998
7
12

Extract the first n chunks of a string within a data frame column

1 Answers1