0

I have a dataframe:

ID       value
1      he following object is masked from ‘package:purrr’
2      Attaching package: ‘magrittr’
3      package ‘ggplot2’ was built under R version 3.6.2
4      Warning messages:

here is a code to transform a column value:

df <- df %>% 
  mutate(value = stringr::str_replace(value, '(^he following object)', '\\1'),
         value = stringr::str_replace(value, '(^Attaching package:)', '\\1'),
         value = stringr::str_replace(value, '(^package ‘ggplot2’)', '\\1'))
) %>%   
  group_by(ID, value) 

the output is:

ID       value
1      he following object
2      Attaching package: 
3      package ‘ggplot2’
4      Warning messages:

As you see i use stringr::str_replace several times for one column. my actual data is much much larger (like millions of rows). this is just a subset example. so, how could i combine this three times using of this functioning one time? i want to use same functions and libraries(no radical change)

I tried this, but it doesn't work too:

df <- df %>% 
  mutate(value = str_replace_all(value, '(^he following object).*|(^Attaching package:).*|(^package ‘ggplot2’).*', '\\1')) %>%   
  group_by(ID, value)

It gave me this:

ID       value
1      he following object’
2      
3     
4      Warning messages:

2 Answers2

0

is it this what you are looking for?

df %>% 
 mutate(value = stringr::str_replace_all(value, 
                                         c('(^he following object).*' = '\\1',
                                           '(^Attaching package:).*'= '\\1',
                                           '(^package ‘ggplot2’).*'= '\\1')
                                         ))
#>   ID               value
#> 1  1 he following object
#> 2  2  Attaching package:
#> 3  3   package ‘ggplot2’
#> 4  4   Warning messages:

Note that I had to add .* because your code was not working to me. It was not replacing the whole sentence.

Edo
  • 7,567
  • 2
  • 9
  • 19
  • i think that its better to use str_extract_all. for cases of very long strings. how it should look like? –  Nov 13 '20 at 08:45
  • why don't you use @Ronak Shah's solution? – Edo Nov 13 '20 at 08:53
  • because it doesn't give desired result. 4th row in column values is empty. your code works, but my data has very long strings in column value, so when i use .*, processing takes very long time. so i wonder could i do the same thing but with str_starts or without .* ? –  Nov 13 '20 at 08:56
  • have you tried my suggestion with coalesce? `coalesce(str_extract(value, '^(he following object|Attaching package:|package ‘ggplot2)'), value)` ? – Edo Nov 13 '20 at 09:06
  • I see that Ronak updated his answer with this. That's probably what you're looking for. – Edo Nov 13 '20 at 09:08
  • and this is not the same coalesce(str_extract(value, '^(he following object|Attaching package:|package ‘ggplot2)'), value) ? –  Nov 13 '20 at 09:23
0

Instead of using str_replace and capturing the string with back reference you can use use str_extract and then coalesce with the existing value.

library(dplyr)
library(stringr)

df %>%
  mutate(value1 = str_extract(value, 
                '^(he following object|Attaching package:|package ‘ggplot2)'), 
         value = coalesce(value1, value)) %>%
  select(-value1)

#  ID               value
#1  1 he following object
#2  2  Attaching package:
#3  3    package ‘ggplot2
#4  4   Warning messages:
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • because your expected output is `he following object` and not `he following object is masked from ‘package:purrr’`. You only want to extract the part of string that matches the pattern. – Ronak Shah Nov 12 '20 at 11:08
  • or he can `coalesce` with the existing `value` – Edo Nov 12 '20 at 11:31
  • desired result is different. 4th row in column values is empty. could i do that with str_starts? –  Nov 13 '20 at 08:58
  • As @Edo mentioned you can `coalesce` the extracted value with the existing value. See updated answer. – Ronak Shah Nov 13 '20 at 09:07