Given a set of regular expressions, is there a simple way to match multiple patterns, and replace the matched text according to the pattern that was matched?
For example, for the following data x
, each element begins with either a number or a letter, and ends with either a number or a letter. Let's call these patterns num_num
(for begins with number, ends with number), num_let
(begins with number, ends with letter), let_num
, and let_let
.
x <- c('123abc', '78fdsaq', 'aq12111', '1p33', '123', 'pzv')
type <- list(
num_let='^\\d.*[[:alpha:]]$',
num_num='^\\d(.*\\d)?$',
let_num='^[[:alpha:]].*\\d$',
let_let='^[[:alpha:]](.*[[:alpha:]])$'
)
To replace each string with the name of the pattern it follows, we could do:
m <- lapply(type, grep, x)
rep(names(type), sapply(m, length))[order(unlist(m))]
## [1] "num_let" "num_let" "let_num" "num_num" "num_num" "let_let"
Is there a more efficient approach?
gsubfn
?
I know that with gsubfn
we can simultaneously replace different matches, e.g.:
library(gsubfn)
gsubfn('.*', list('1p33'='foo', '123abc'='bar'), x)
## [1] "bar" "78fdsaq" "aq12111" "foo" "123" "pzv"
but I'm not sure whether the replacements can be made dependent on the pattern that was matched rather than on the match itself.
stringr
?
str_replace_all
doesn't play nicely with this example, since matches are replaced for patterns iteratively, and we end up with everything being overwritten with let_let
:
library(stringr)
str_replace_all(x, setNames(names(type), unlist(type)))
## [1] "let_let" "let_let" "let_let" "let_let" "let_let" "let_let"
Reordering type
so the pattern corresponding to let_let
appears first solves the problem, but needing to do this makes me nervous.
type2 <- rev(type)
str_replace_all(x, setNames(names(type2), unlist(type2)))
## [1] "num_let" "num_let" "let_num" "num_num" "num_num" "let_let"