1

I have a data like this

 clas=c("CD_1","X.2_2","K$2_3","12k3_4",".A_5","xy_6")
 df <- data.frame(clas)
> df
    clas
1   CD_1
2  X.2_2
3  K$2_3
4 12k3_4
5   .A_5
6   xy_6

and I would like to change some rows that match this condition

if the strings after _ are 4,5 and 6 replace the strings before the _ with string B. So the output should like this;

    clas
1   CD_1
2  X.2_2
3  K$2_3
4 12kB_4
5   .B_5
6   xB_6

Thanks!

EDIT::

SO If I have data like this:

    clas
1   CD_1
2  X.2_2
3  K$2_3
4 12k3_4
5   .A_5
6  xy_11

Then applying your solution,

df %>% mutate(clas = str_replace(clas, "(.)(_[4511])", "B\\2"))

    clas
1   CB_1
2  X.2_2
3  K$2_3
4 12kB_4
5   .B_5
6  xB_11

But I only want to match 11 not 1. How can we do that ?

Alexander
  • 4,527
  • 5
  • 51
  • 98

1 Answers1

5
library(dplyr)
library(stringr)

clas <- c("CD_1","X.2_2","K$2_3","12k3_4",".A_5","xy_6")
df <- data.frame(clas)

df %>% mutate(clas = str_replace(clas, "(.)(_[456])", "B\\2"))

Here putting the matching pattern creates a match with 3 groups, the first containing the whole expression match ._[456], the second containing the . part and the third containing the _[456] part.

\\2 accesses the third group (0 indexing) and so you replace the whole pattern ._[456] with B followed by whatever matched _[456] where [456] is a character matching any of the options inside the brackets.

EDIT:

Each character inside of [] is treated individually, so [1111] is no different from [1] because that pattern only matches a single character that is either a 1 or 1 or 1 or 1. Instead you need to use | so you have (.)(_[45]|_11). This matches _4 or _5 or _11 in the second pattern group. Also if you want to match 1-9 but not 11 or 15 you need to use (.)(_[45])$ where $ is the end-of-string indicator. Go look at the cheatsheet and test these out on RegExr.

shians
  • 955
  • 1
  • 6
  • 21
  • Thanks for the answer and explanation. Let's say we have more than three numbers not just 4,5,6 how we implement that? For example seq(1,10) is not working inside of `"(.)(_[456])"` I assume ? – Alexander Aug 17 '17 at 01:20
  • 1
    Well all the options for matching are inside of `[]`, so you can put in `[123456789]` or simply `[1-9]`. Have a look at this [regular expression cheatsheet](https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf), the only tricky part here is the use of the `\\2` back-reference. – shians Aug 17 '17 at 01:27
  • Your solution is great but I just realized one minor issue in the real data.frame. Could you check the OP's EDIT part ? – Alexander Aug 17 '17 at 01:44
  • 1
    Editted my answer. – shians Aug 17 '17 at 01:54
  • Thanks for guidance and help:). Really appreciated!! – Alexander Aug 17 '17 at 02:00