How can I convert a string like Žvaigždės aukštybėj užges
or äüöÖÜÄ
to Zvaigzdes aukstybej uzges
or auoOUA
, respectively, using Bash?
Basically I just want to convert all characters which aren't in the Latin alphabet.
Thanks
Depending on your machine you can try piping your strings through
iconv -f utf-8 -t ascii//translit
(or whatever your encoding is, if it's not utf-8)
You might be able to use iconv.
For example, the string:
Žvaigždės aukštybėj užges or äüöÖÜÄ
is in file testutf8.txt, utf8 format.
Running command:
iconv -f UTF8 -t US-ASCII//TRANSLIT testutf8.txt
results in:
Zvaigzdes aukstybej uzges or auoOUA
echo Hej på dig, du den dära | iconv -f utf-8 -t us-ascii//TRANSLIT
gives:
Hej pa dig, du den dara
try {
String name = "Žvaigždės aukštybėj užges ";
String s1 = Normalizer.normalize(name, Normalizer.Form.NFKD);
String regex = "[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+";
String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");
} catch (UnsupportedEncodingException e) {
}