3

Suppose I want to extract all letters between the letter a and c. I've been so far using the stringr package which gives a clear idea of the full matches and the groups. The package for example would give the following.

library(stringr)
str_match_all("abc", "a([a-z])c")
# [[1]]
#     [,1]  [,2]
# [1,] "abc" "b" 

Suppose I only want to replace the group, and not the full match---in this case the letter b. The following would, however, replace the full match.

str_replace_all("abc", "a([a-z])c", "z")
[1] "z"
# Desired result: "azc"

Would there be any good ways to replace only the capture group? suppose I wanted to do multiple matches.

str_match_all("abcdef", "a([a-z])c|d([a-z])f")
# [[1]]
#      [,1]  [,2] [,3]
# [1,] "abc" "b"  NA
# [2,] "def" NA   "e"
str_replace_all("abcdef", "a([a-z])c|d([a-z])f", "z")
# [1] "zz"
# Desired result: "azcdzf"

Matching groups was easy enough, but I haven't found a solution when a replacement is desired.

Kim
  • 4,080
  • 2
  • 30
  • 51

2 Answers2

5

It is not the way regex was designed. Capturing is a mechanism to get the parts of strings you need and when replacing, it is used to keep parts of matches, not to discard.

Thus, a natural solution is to wrap what you need to keep with capturing groups.

In this case here, use

str_replace_all("abc", "(a)[a-z](c)", "\\1z\\2")

Or with lookarounds (if the lookbehind is a fixed/known width pattern):

str_replace_all("abc", "(?<=a)[a-z](?=c)", "z")
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Usually when I want to replace certain pattern of characters in a text\string I use the grep family functions, that is what we call working with regular expressions.

You can use sub function of the grep family functions to make replacements in strings.

Exemple:

sub("b","z","abc")
[1] "azc"

You may face more challenges working with replacement, for that, grep family functions offers many functionality:

replacing all characters by your preference except a and c:

sub("[^ac]+","z","abBbbbc")
[1] "azc"

replacing the second match

sub("b{2}","z","abBbbbc")
[1] "abBzbc"

replacing all characters after the pattern:

sub("b.*","z","abc")
[1] "az"

the same above except c:

sub("b.*[^c]","z","abc")
[1] "abc"

So on...

You can look for "regular expressions in R using grep" into internet and find many ways to work with regular expressions.

Arduin
  • 233
  • 4
  • 15
  • Thank you for your answer. I am aware of ```grep```, ```sub```, ```gsub```, and so on. It has not been suitable for my examples. It is not a specific letter ```b``` that I'm after (I'm using this simply as an illustration). – Kim Mar 30 '18 at 18:22