6

I want to replace non-ascii characters (for now, only spanish), by their ascii equivalent. If I have "á", I want to replace it with "a" and so on.

I built this function (works fine), but I don't want to use a loop (including internal loops like sapply).

latin2ascii<-function(x) {
if(!is.character(x)) stop ("input must be a character object")
require(stringr)
mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
for(y in 1:length(mapL)) {
  x<-str_replace_all(x,mapL[y],mapA[y])
  }
x
}

Is there an elegante way to solve it? Any help, suggestion or modification is appreciated

Álvaro
  • 98
  • 1
  • 6

2 Answers2

7

gsubfn() in the package of the same name is really nice for this sort of thing:

library(gsubfn)

# Create a named list, in which:
#   - the names are the strings to be looked up
#   - the values are the replacement strings
mapL <- c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA <- c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")

# ll <- setNames(as.list(mapA), mapL) # An alternative to the 2 lines below
ll <- as.list(mapA)
names(ll) <- mapL


# Try it out
string <- "ÍÓáÚ"
gsubfn("[áéíóúÁÉÍÓÚñÑüÜ]", ll, string)
# [1] "IOaU"

Edit:

G. Grothendieck points out that base R also has a function for this:

A <- paste(mapA, collapse="")
L <- paste(mapL, collapse="")
chartr(L, A, "ÍÓáÚ")
# [1] "IOaU"
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Thanks! works perfectly. Only one question (just to know); do you know if the gsubfn function use any kind of internal loop? Should be faster than sapply? – Álvaro May 22 '12 at 16:00
  • @Álvaro -- I don't think `gsubfn()` is particularly fast -- 'just' convenient and elegant. – Josh O'Brien May 22 '12 at 16:28
  • 1
    Also see `chartr` in the base of R which seems ok for the problem as stated although if there are variations in the real problem such as replacing two character sequences then `gsubfn` could still handle it but not `chartr`. – G. Grothendieck May 22 '12 at 18:53
  • @G.Grothendieck -- Thanks for pointing that out. I've appended it to the answer. – Josh O'Brien May 22 '12 at 20:27
2

I like the version by Josh, but I thought I might add another 'vectorized' solution. It returns a vector of unaccented strings. It also only relies on the base functions.

x=c('íÁuÚ','uíÚÁ')

mapL<-c("á","é","í","ó","ú","Á","É","Í","Ó","Ú","ñ","Ñ","ü","Ü")
mapA<-c("a","e","i","o","u","A","E","I","O","U","n","N","u","U")
split=strsplit(x,split='')
m=lapply(split,match,mapL)
mapply(function(split,m) paste(ifelse(is.na(m),split,mapA[m]),collapse='') , split, m)
# "iAuU" "uiUA"
nograpes
  • 18,623
  • 1
  • 44
  • 67