3

I am trying to use the iconv function in R to achieve the correct transliteration of German words (for example, MöbelMoebel).

I have written the following code (tried with English/German locales):

iconv("Möbel", "latin1", "ASCII//TRANSLIT")
[1] "Mobel"

iconv("Möbel", "UTF-8", "ASCII//TRANSLIT")
[1] NA

iconv("Möbel", "UTF-8", "ASCII//TRANSLIT", sub ="")
[1] "Mbel"

iconv("Möbel", "Windows-1252", "ASCII//TRANSLIT")
[1] "Mobel"

However, this is not working properly. Here is the output of some of my tests:

#cat + library(ds4psy)
iconv(cat ("M", Umlaut["o"],"bel", sep = ""), "latin1", "ASCII//TRANSLIT")
Möbelcharacter(0)
#paste/paste0 + library(ds4psy)
> iconv(paste ("M", Umlaut["o"],"bel", sep = ""), "latin1", "ASCII//TRANSLIT")
[1] "MA?bel"

For completeness, I also tried the function stri_trans_general from stringi:

stri_trans_general("Möbel", "latin-ascii")
[1] "Mobel"

but, as you can see, this didn't work, either.

What I don't understand is why the iconv function is not working properly in R when it clearly works correctly in PHP:

<?php
    //some German
    $utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz';
    setlocale(LC_ALL, 'de_DE');
    
    $trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);
    
    //gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
    echo $trans_sentence . PHP_EOL;
?>

Why am I seeing this difference in behavior with the iconv version in R vs. PHP? What am I doing wrong in my R code?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
manro
  • 3,529
  • 2
  • 9
  • 22
  • the one that returned `NA` was the closest to the working PHP one. If I were you, I would do a string replace (I have no idea how to do that in R) of `ä`→`ae` etc. – Walter Tross Nov 06 '21 at 12:31
  • @WalterTross Yes, we can do it with ```regex```. f.e.: ```> str_replace("Möbel", "ö", "oe") [1] "Moebel"``` But how to fix ```iconv```, interesting. – manro Nov 06 '21 at 12:36
  • @WalterTross You are welcome, question was reopened. I really don't know, where is this function broken. Or i use it wrong? – manro Nov 07 '21 at 07:35

1 Answers1

5

If it isn't necessary that you use iconv, there is another way to achieve your goals.

You can define a set of German characters you want to transliterate and a set of their replacements and use these pairs as input for str_replace_all:

Data:

gg <- c("Göthe", "gerädert", "Hürde", "weiß")

First, define your sets:

set <- setNames(c("oe", "ae", "ue", "ss"),
                c("ö", "ä", "ü", "ß"))

Then replace:

library(stringr)
str_replace_all(gg, set)
[1] "Goethe"    "geraedert" "Huerde"    "weiss" 
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34