0

I am rewriting my vb.net code in R and have come to a roadblock. The code in vb.net essentially counts the number of characters in a string that do not occur in a string of allowed characters. The code in vb.net is:

StringtoConvert="ABC"
strAllowedChars="AC"
For i= 1 to len(StringtoConvert)
  If InStr(1, strAllowedChars, StringtoConvert(i))=0 then
  disallowed=disallowed+1
  Else
  End If
Next

I can see how to do this in R using loops to search the string for each of the allowed characters but is there a way in R to do this using an aggregate like the strAllowedChars above?

The str_count function of the stringr package in R is the closest that I have found but it looks matches to the entire strAllowedChars rather than looking at each character independently. How can I test the StringtoConvert to make sure it contains only the strAllowedChars as individual characters. In other words in the example above if a character in StringtoConvert does not match one of the characters in strAllowedCharacters then I need to either identify it as such and use another call to replace it or replace it directly.

The R code that I have tried is:

    library(stringr)
    testerstring<-"CYA"
    testpattern<-"CA"
    newtesterstring<-str_count(testerstring,testpattern)
    print(newtesterstring)

The desired output is the number of characters in the StringtoConvert that are disallowed based on the allowed characters-strAllowedChars. I will then use that in a loop to change any disallowed character to a "G" using an if then statement so it would also be desirable if I could skip the step of counting and instead just replace any disallowed character with a "G".

Jamie
  • 555
  • 3
  • 14
  • Can you please clarify what your expected output is? Or better yet, exactly what you are trying to do, rather than just us having to reproduce a specific visual basic function? – Ian Campbell Apr 12 '21 at 19:32
  • @IanCampbell sorry for any ambiguity. I added a paragraph to the bottom of the question clarifying what the desired output is. – Jamie Apr 12 '21 at 21:13

3 Answers3

3

You could use strsplit to get each character in strAllowedChars and then subtract the no of allowed characters in StringtoConvert from the total no of characters in StringtoConvert.

That will give you the total no of disallowed characters in StringtoConvert, if that's what you are after.

StringtoConvert <- "ABCrrrrr"
strAllowedChars <- "ACT"
disallowed <- nchar(StringtoConvert) - sum(stringr::str_count(StringtoConvert, strsplit(strAllowedChars,"")[[1]]))

disallowed

To replace all but the allowed characters with 'G' you can try this.

> StringtoConvert <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> strAllowedChars <- "ACT"
> 
> stringr::str_replace_all(StringtoConvert, paste0("[^", strAllowedChars, "]"), "G")
[1] "AGCGGGGGGGGGGGGGGGGTGGGGGG" 
norie
  • 9,609
  • 2
  • 11
  • 18
  • It's interesting how two people can come up with the exact same approach at basically the same time. I guess there are only so many optimal approaches to a problem. – Ian Campbell Apr 12 '21 at 21:26
  • @IanCampbell I didn't see your answer - I was working on mine after the OP had clarified what they wanted. I'll remove mine to avoid confusion. – norie Apr 12 '21 at 21:29
  • Don't get me wrong, I totally believe you came up with the same idea. This happens all the time. No need to edit it out from my perspective. – Ian Campbell Apr 12 '21 at 21:30
  • Both worked well (obviously) but @iancampbell posted his first so I am going to show his as the correct answer. I will upvote yours too. Thank you both for your help. – Jamie Apr 12 '21 at 21:55
3

Here's an approach with str_replace_all. We can generate a regular expression to identify characters that are not in a set. For example, [^AC] matches any characters not A or C:

library(stringr)
StringtoConvert="ABC"
strAllowedChars="AC"
str_replace_all(StringtoConvert,paste0("[^",strAllowedChars,"]"),"G")
#[1] "AGC"

set.seed(12345)
sample(LETTERS,50,replace = TRUE) %>% paste(collapse = "") -> StringtoConvert2
str_replace_all(StringtoConvert2,paste0("[^",strAllowedChars,"]"),"G")
#[1] "GGGGGGGGGGGGGGGGGGAGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGG"
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Works great. I added some other stuff but overall I went from 22 lines of code in VB.net to 8 in R. – Jamie Apr 12 '21 at 21:57
0

Using R Base only:

StringtoConvert="ABC"
strAllowedChars="AC"
Res=nchar(StringtoConvert)-sum(strsplit(StringtoConvert,"")[[1]] %in% strsplit(strAllowedChars,"")[[1]])