How to replace characters in a string vector based on a position vector in R?

Question

For an example:

set.seed(123)
library(stringi)
df<-data.frame(p=sprintf("%s", stri_rand_strings(11, 11, '[A-Z]')), 
               n=sample(1:10, 11, 1),
               s=sprintf("%s", stri_rand_strings(11, 1, '[A-Z]')))
df
             p  n s
1  GPCMCEHPTEW  3 X
2  STDJRNJGBGX  8 P
3  VTEDZLMEPHF  6 L
4  RHVCVLTRLQA  4 Y
5  FSFVIRYDDRL  7 S
6  VZBLSCZGBRU 10 K
7  JJHCJENNYIM  8 A
8  CWKTELUBVHJ  4 O
9  IANRXAZHYRL 10 M
10 VBTJVNHUCVH  9 W
11 TZCWUKIFOXN  6 V

What I wanted is to create a new column new_p where the character in p at position n is replaced by s. Thus the first df$new_p[1] should be GPXMCEHPTEW.

akrun · Accepted Answer · 2019-12-13T15:21:38.823

An option would be substring

for(i in seq_len(nrow(df)))  substring(df$p[i], df$n[i], df$n[i]) <- df$s[i]


df
#             p  n s
#1  GPXMCEHPTEW  3 X
#2  STDJRNJPBGX  8 P
#3  VTEDZLMEPHF  6 L
#4  RHVYVLTRLQA  4 Y
#5  FSFVIRSDDRL  7 S
#6  VZBLSCZGBKU 10 K
#7  JJHCJENAYIM  8 A
#8  CWKOELUBVHJ  4 O
#9  IANRXAZHYML 10 M
#10 VBTJVNHUWVH  9 W
#11 TZCWUVIFOXN  6 V

We could also make use of rawToChar/charToRaw

df$p <- mapply(function(x, y, z) rawToChar(replace(charToRaw(x), y, 
         charToRaw(z))), df$p, df$n, df$s)

data

df <- structure(list(p = c("GPCMCEHPTEW", "STDJRNJGBGX", "VTEDZLMEPHF", 
"RHVCVLTRLQA", "FSFVIRYDDRL", "VZBLSCZGBRU", "JJHCJENNYIM", "CWKTELUBVHJ", 
"IANRXAZHYRL", "VBTJVNHUCVH", "TZCWUKIFOXN"), n = c(3L, 8L, 6L, 
4L, 7L, 10L, 8L, 4L, 10L, 9L, 6L), s = c("X", "P", "L", "Y", 
"S", "K", "A", "O", "M", "W", "V")), class = "data.frame",
row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

Thanks @akrun +1, is there a direct way to create a new column (e.g. keep the original `p`)? — David Z, Dec 13 '19 at 15:19
@DavidZ I added an option as well. In the first case, you can also create a new column i.e. `df$p1 <- df$p` and then do the `substring`on that column — akrun, Dec 13 '19 at 15:24

score 3 · Answer 2 · answered Dec 13 '19 at 15:57

Another option with regex:

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(p_new = gsub(sprintf(paste0("^(.{",n-1,"}).(.*)")), 
                      sprintf(paste0("\\1",s,"\\2")), 
                                     p))

#>    p               n s     p_new      
#>    <chr>       <int> <chr> <chr>      
#>  1 GPCMCEHPTEW     3 X     GPXMCEHPTEW
#>  2 STDJRNJGBGX     8 P     STDJRNJPBGX
#>  3 VTEDZLMEPHF     6 L     VTEDZLMEPHF
#>  4 RHVCVLTRLQA     4 Y     RHVYVLTRLQA
#>  5 FSFVIRYDDRL     7 S     FSFVIRSDDRL
#>  6 VZBLSCZGBRU    10 K     VZBLSCZGBKU
#>  7 JJHCJENNYIM     8 A     JJHCJENAYIM
#>  8 CWKTELUBVHJ     4 O     CWKOELUBVHJ
#>  9 IANRXAZHYRL    10 M     IANRXAZHYML
#> 10 VBTJVNHUCVH     9 W     VBTJVNHUWVH
#> 11 TZCWUKIFOXN     6 V     TZCWUVIFOXN

How to replace characters in a string vector based on a position vector in R?

2 Answers2

data