1

I have a dataset that has a column like

   string<-c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core')
   replacement<-c('Rstudio','Jupyter','spyder','R')

I want to replace the string value id they match the value in replacement. I am using the following code right now

gsub(paste(replacement, collapse = "|"), replacement = replacement, x = string)

This in another piece of code which i am using to find the cases

string[grepl(paste(replacement, collapse='|'), string, ignore.case=TRUE)]

I want to update the ones that I find I want the output to be like

Rstudio,Rstudio,'',Jupyter,spyder,R

I don't want to do it by hard coding it. I want to write a code that is scalable.

Any help is really appreciated

thanks in advance

nityansh seth
  • 31
  • 2
  • 9

2 Answers2

1

isolate id using gsub function and then find id that is not matching the length of replacement by means of is.na function. Then replace the identified id with empty character ''.

EDIT: Since you changed the string data in the question, I modified the gsub function. The pattern used in gsub function will find the numeric value right after lib text and omit the remaining part of the string element.

replacement<-c('Rstudio','Jupyter','spyder','R')

string<-c('lib1_Rstudio','lib2_Rstudio','lib5_python','lib3_Jupyter','lib1_spyder','lib1_R')
index <- is.na( replacement[ as.integer( gsub( "lib([[:digit:]])*[[:alnum:]_\ ]*", "\\1", string)) ] )
a1 <- sapply( strsplit(string, "_"), function( x ) x[2] )
a1[ index ] <- ''
a1
# [1] "Rstudio" "Rstudio" ""        "Jupyter" "spyder"  "R"    

string <- c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core')
index <- is.na( replacement[ as.integer( gsub( "lib([[:digit:]])*[[:alnum:]_\ ]*", "\\1", string)) ] )
a1 <- sapply( strsplit(string, "_"), function( x ) x[2] )
a1[ index ] <- ''
a1
# [1] "Rstudio" "Rstudio" ""        "Jupyter" "spyder"  "R"
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • I changed the rank of string, like `string<-c('lib1_Rstudio','lib2_python','lib5_Rstudio','lib3_Jupyter','lib1_spyder','lib1_R')`, and return wrong result `"Rstudio" "python" "" "Jupyter" "spyder" "R" `. Could you tell me why it is wrong? – Vida Wang Mar 09 '17 at 01:19
  • The id 5 is greater than the length of `replacement` which is the reason for the third element `lib5_Rstudio` turned to `''` empty character – Sathish Mar 09 '17 at 01:29
  • The length of `replacement` is 4, because there are 4 elements in this character vector - `replacement` – Sathish Mar 09 '17 at 01:31
  • Thanks for your explain. – Vida Wang Mar 09 '17 at 01:33
  • Thanks for the explanation – nityansh seth Mar 10 '17 at 17:45
1

This another simple code I used. That doesn't need the regex function.Thanks for the help

string<-c('lib1_Rstudio_case1','lib2_Rstudio_case1and2','lib5_python_notthe correct_language','lib3_Jupyter_really_good','lib1_spyder_nice','lib1_R_the_core')
replacement<-c('R','Jupyter','spyder','Rstudio')
replaced=string
replaced=''


for (i in 1:length(replacement))
{
  replaced[which(grepl(replacement[i],string))]=replacement[i]
}
replaced[is.na(replaced)]=''
nityansh seth
  • 31
  • 2
  • 9