38

How can I convert a string like Žvaigždės aukštybėj užges or äüöÖÜÄ to Zvaigzdes aukstybej uzges or auoOUA, respectively, using Bash?

Basically I just want to convert all characters which aren't in the Latin alphabet.

Thanks

watain
  • 4,838
  • 4
  • 34
  • 35

5 Answers5

65

Depending on your machine you can try piping your strings through

iconv -f utf-8 -t ascii//translit

(or whatever your encoding is, if it's not utf-8)

Michael Krelin - hacker
  • 138,757
  • 24
  • 193
  • 173
  • echo "Aldo Vásquez" | iconv -f utf-8 -t ascii//translit Aldo V'asquez . running this through the app I"m trying to match , it outputs "Aldo_Vasquez" how would I get iconv to do that? – majorgear May 22 '22 at 21:04
  • 1
    @majorgear, honestly — no idea. This particular case can be handled by something like `echo "Aldo Vásquez" | tr 'ÁáÉé' 'AaEe'`, but it's hardly a solution to write home about… – Michael Krelin - hacker May 23 '22 at 06:03
  • Thanks. I was trying to replicate the output of an app I was using. Its open source so I’m just going to have to dig into the code to find how it’s doing the conversion. – majorgear May 24 '22 at 13:31
18

You might be able to use iconv.

For example, the string:

Žvaigždės aukštybėj užges or äüöÖÜÄ

is in file testutf8.txt, utf8 format.

Running command:

iconv -f UTF8 -t US-ASCII//TRANSLIT testutf8.txt

results in:

Zvaigzdes aukstybej uzges or auoOUA

Steve De Caux
  • 1,779
  • 12
  • 13
8
echo Hej på dig, du den dära | iconv -f utf-8 -t us-ascii//TRANSLIT

gives:

Hej pa dig, du den dara
Emil Vikström
  • 90,431
  • 16
  • 141
  • 175
3

You can also use the python library unidecode to perform so:

$ echo "Žvaigždės aukštybėj užges äüöÖÜÄ" | unidecode

Output:

Zvaigzdes aukstybej uzges auoOUA

See this post for other approaches.

GLNB
  • 61
  • 5
-2
 try {
        String name = "Žvaigždės aukštybėj užges ";
        String s1 = Normalizer.normalize(name, Normalizer.Form.NFKD);
        String regex = "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+";

        String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");

    } catch (UnsupportedEncodingException e) {
    }
Zombo
  • 1
  • 62
  • 391
  • 407
  • 8
    Easy to criticise this, but a newbie took the effort, got shouted down and has now left SO. [Slow clap..] And what would we do without iconv? – geotheory Dec 19 '15 at 09:35
  • 2
    @geotheory: ...and it's not like the other answers are pure bash, either. They all rely on an external executable. All this answer really needs is instructions to compile the java file and run it from bash. – pyrocrasty Mar 08 '16 at 10:43