2

I have a dataframe in R with a column of strings, e.g. v1 <- c('JaStADmmnIsynDK', 'laUksnDTusainS')

My goal is to capitalize all letters in each string except 's', 't' and 'y'.

So the result should end up being: 'JAStADMMNIsyNDK' and 'LAUKsNDTUsAINS'.

Thus not changing any of the said letters: 's', 't' and 'y'.

As of now I do it by simply having 25x

levels(df$strings) <- sub('n', 'N', levels(df$strings))

But that seems to be overkill! How can I do this easily in R?

smci
  • 32,567
  • 20
  • 113
  • 146
Thigers
  • 23
  • 4

3 Answers3

6

Try

v2 <- gsub("[sty]", "", paste(letters, collapse="")) 
chartr(v2, toupper(v2), v1)
#[1] "JAStADMMNIsyNDK" "LAUKsNDTUsAINS" 

data

v1 <- c("JaStADmmnIsynDK", "laUksnDTusainS")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    That is brilliant - Thank you. I'll remember to make a line with the data example next time. – Thigers May 04 '16 at 20:27
  • I have no idea why @Thigers unaccepted your solution and accepted mine. I was surprised to see that myself. Personally I think your solution is much better. – Rohit Das May 11 '16 at 20:41
  • @RohitDas It's okay. I guess he likes your solution better. – akrun May 12 '16 at 01:23
  • 1
    As a complete rookie in posting questions on StackOverflow I tried to accept both answers in my eager over actually receiving feedback. Sorry for that, and thank you both for the solutions. Cheers. – Thigers May 14 '16 at 08:40
  • This is brilliant. We cannot do `gsub('([a-ru-xz])', toupper('\\1'), v1)` since `toupper()` will not work on the capture group '\1', only on a string literal. – smci Feb 03 '17 at 06:34
  • 1
    @smci I would use `gsub('([a-ru-xz])', '\\U\\1'), v1, perl = TRUE)` – akrun Feb 03 '17 at 07:19
  • 1
    @Akrun ah thanks. '\U' operator. I'll post that as an alternative and acknowledge you. – smci Feb 03 '17 at 11:24
1

The answer posted by @akrun is indeed brilliant. But here is my more direct brute force approach which I finished too late.

s <- "JaStADmmnIsynDK"

customUpperCase <- function(s,ignore = c("s","t","y")) {
  u <- sapply(unlist(strsplit(s,split = "")),
              function(x) if(!(x %in% ignore)) toupper(x) else x )
  paste(u,collapse = "")
}

customUpperCase(s)
#[1] "JAStADMMNIsyNDK"
Rohit Das
  • 1,962
  • 3
  • 14
  • 23
1

We can directly gsub() an uppercase replacement on each applicable lowercase letter, using the perl '\U' operator on the '\1' capture group (which @Akrun reminded of):

v1 <- c("JaStADmmnIsynDK", "laUksnDTusainS")
gsub('([a-ru-xz])', '\\U\\1'), v1, perl = TRUE)
"JAStADMMNIsyNDK" "LAUKsNDTUsAINS"
smci
  • 32,567
  • 20
  • 113
  • 146