How to replace string for every row in specfic column using dplyr and stringr

Question

I have the following tibble:

library(tidyverse)

df <- tibble::tribble(
  ~sample, ~colB, ~colC,
  "foo",   1,  2,
  "bar_x",   2,  3,
  "qux.6hr.ID",   3,  4,
  "dog",   1,  1
)


df
#> # A tibble: 4 x 3
#>       sample  colB  colC
#>        <chr> <dbl> <dbl>
#> 1        foo     1     2
#> 2      bar_x     2     3
#> 3 qux.6hr.ID     3     4
#> 4        dog     1     1

df <- factor(final_df$samples, levels=c("bar_x","foo","qux.6hr.ID","dog"))

    df
#> [1] foo        bar_x      qux.6hr.ID dog       
#> Levels: bar_x foo qux.6hr.ID dog

What I want to do is for every row in sample column remove these substrings: _x and .6hr if exist. The final table looks like this:

     sample  colB  colC
        foo     1     2
        bar     2     3
     qux.ID     3     4
        dog     1     1

How can I achieve that?

`df %>% mutate(sample = gsub('_x|\\.6hr', '', sample))` or equivalently with stringr, `df %>% mutate(sample = str_replace_all(sample, '_x|\\.6hr', ''))` — alistaire, Jun 03 '17 at 05:08
@alistaire Actually my df contain factor. See my update. Sorry. How can I modify your code? — pdubois, Jun 03 '17 at 05:17
`gsub` still works, though it coerces to character. You could make a call to `levels<-`, but it's a little awkward in dplyr syntax. The forcats package supplies an alternative: `df %>% mutate(sample = factor(sample), sample = forcats::fct_relabel(sample, function(x){str_replace_all(x, '_x|\\.6hr', '')}))` though you have to structure the second parameter as a function à la `lapply`. — alistaire, Jun 03 '17 at 05:33

akrun · Accepted Answer · 2017-06-03T05:22:40.467

10

We can use

df %>% 
     mutate(sample = gsub("_x|\\.\\d+[A-Za-z]+", "", sample))
# A tibble: 4 x 3 
#   sample  colB  colC
#    <chr> <dbl> <dbl>
#1    foo     1     2
#2    bar     2     3
#3 qux.ID     3     4
#4    dog     1     1

If the 'sample' column is factor class either we can wrap with factor on the output of gsub or do this on the levels of sample

levels(df$sample) <- gsub("_x|\\.\\d+[A-Za-z]+", "", levels(df$sample))
df$sample
#[1] foo    bar    qux.ID dog   
#Levels: bar foo qux.ID dog

edited Jun 03 '17 at 05:22

answered Jun 03 '17 at 05:12

akrun

874,273
37
540
662

Actually my df contains factor. See my update. Sorry. How can I modify your code? – pdubois Jun 03 '17 at 05:18
1

@pdubois `gsub` will take `factor` as well. If you retain as `factor`, then wrap the output with `factor` i.e. `mutate(sample = factor(gsub(..` – akrun Jun 03 '17 at 05:20

score 2 · Answer 2 · answered Sep 09 '21 at 21:14

And here's a solution using the purrr:map function, which has the added benefit of returning the same result whether "sample" is chr or factor.

df %>%
   mutate(sample = map_chr(sample, ~str_replace(.x, 
                                         pattern = "_x|\\.\\d+[A-Za-z]+", 
                                         replacement = "")))
# A tibble: 4 x 3
#  sample  colB  colC
#  <chr>  <dbl> <dbl>
#1 foo        1     2
#2 bar        2     3
#3 qux.ID     3     4
#4 dog        1     1

How to replace string for every row in specfic column using dplyr and stringr

2 Answers2