4

I have a long string with mupltiple instances of pattern. I want the n characters following the pattern. Say that my string is "quick fox jumps over the lazy dog" and I want the two characters after every "u". i.e. I would want a vector c("ic", "mp") as my output. How can I do this?

Thanks!

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Tordir
  • 191
  • 6

3 Answers3

5

We can use str_extract_all - create a function (with arguments for the string, n - number of characters, after and the chr - for the character to match

library(stringr)
f1 <- function(string, n, chr)
{
pat <- sprintf("(?<=%s)%s", chr, strrep(".", n))
str_extract_all(string, pat)[[1]]
}

-testing

> f1(str1, 2, "u")
[1] "ic" "mp"
> f1(str1, 3, "u")
[1] "ick" "mps"

data

 str1 <- "quick fox jumps over the lazy dog"
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Similar but using str_extract_all with paste0:

key points: (?<=) is a lookbehind, that matches the pattern but does not include it in the extracted string.

.{n}matches the next n characters after the pattern.

library(stringr)

n <- 2
str_extract_all(string, paste0("(?<=", "u", ").{", n, "}"))[[1]]

[1] "ic" "mp"
TarJae
  • 72,363
  • 6
  • 19
  • 66
2

Here is a base R option using regmatches + gregexpr, along with the pattern "(?<=u)[a-zA-Z]{2}":

> regmatches(s, gregexpr("(?<=u)[a-zA-Z]{2}", s, perl = TRUE))[[1]]
[1] "ic" "mp"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81