4

i have the following test code:

setlocale(LC_ALL, 'en_US.UTF8');
function t($text)
{
    echo "$text\n";
    echo "encoding: ", mb_detect_encoding($text), "\n";

    // transliterate
    $text = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $text);
    echo "iconv: ", $text, "\n";
}

// Latvian alphabet
t('AĀBCČDEĒFGĢHIĪJKĶLĻMNŅOPRSŠTUŪVZŽ aābcčdeēfgģhiījkķlļmnņoprsštuūvzž');
// Greek alphabet
t('ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω');
// Cyrillic alphabet + some rarer versions
t('АБВГДЕЖЅЗИІКЛМНОПҀРСТѸФХѠЦЧШЩЪꙐЬѢꙖѤЮѦѪѨѬѮѰѲѴ абвгдеёжзийклмнопрстуфхцчшщъыьэюя');

and its output:

AĀBCČDEĒFGĢHIĪJKĶLĻMNŅOPRSŠTUŪVZŽ aābcčdeēfgģhiījkķlļmnņoprsštuūvzž
encoding: UTF-8
iconv: AABCCDEEFGGHIIJKKLLMNNOPRSSTUUVZZ aabccdeefgghiijkkllmnnoprsstuuvzz

ΑαΒβΓγΔδΕεΖζΗηΘθΙιΚκΜμΝνΞξΟοΠπΡρΣσςΤτΥυΦφΧχΨψΩω
encoding: UTF-8
iconv: 

АБВГДЕЖЅЗИІКЛМНОПҀРСТѸФХѠЦЧШЩЪꙐЬѢꙖѤЮѦѪѨѬѮѰѲѴ абвгдеёжзийклмнопрстуфхцчшщъыьэюя
encoding: UTF-8
iconv: 

it essentially IGNOREs all greek and cyrillic characters. why?

i have tested on two environments, where php -i | egrep "iconv (implementation|library)" outputs either:

iconv implementation => libiconv
iconv library version => 1.11

or:

iconv implementation => libiconv
iconv library version => 1.13

i have also tried setting ivonv internal encoding to UTF-8, adding/removing the setlocale function, but all of no avail. iconv seems to recognise only latin and derived-from-latin characters.

UPDATE: It must be a problem with iconv as terminal command echo 'ΑαΒβΓγΔδ' | iconv -f utf-8 -t ASCII//TRANSLIT produces an error iconv: (stdin):1:0: cannot convert, while echo 'āēī' | iconv -f utf-8 -t ASCII//TRANSLIT works and outputs aei, as expected.

iconv --version outputs iconv (GNU libiconv 1.14) (besides the copyright information).

Owen Blacker
  • 4,117
  • 2
  • 33
  • 70
Ernests Karlsons
  • 2,220
  • 5
  • 25
  • 37

1 Answers1

3

use ASCII//IGNORE//TRANSLIT

The iconv() stopped at the first illegar char, cutting off the string right there, which is the default behaviour of iconv(), so it did not respect the //IGNORE switch after the //TRANSLIT

Ernests Karlsons
  • 2,220
  • 5
  • 25
  • 37
Rifat
  • 7,628
  • 4
  • 32
  • 46
  • @yes123 I don't know how you had tried. But, it works for the first case. Please read the PHP manual then you'll understand for which It'll work and for which it'll not. – Rifat Dec 06 '11 at 12:08
  • 1
    yep, swapping `//IGNORE` with `//TRANSLIT` indeed doesn't change a thing. – Ernests Karlsons Dec 06 '11 at 16:52
  • @Rifat, the first sample works also without the swap. iconv correctly removes all diacritics from letters ĀČĒĢĪĶĻŅŌŖŠŪŽ etc. please see the output log in my original question. – Ernests Karlsons Dec 06 '11 at 16:59
  • @ErnestsKarlsons OK, but still `ASCII//TRANSLIT//IGNORE` is not the correct way as per documentation. – Rifat Dec 06 '11 at 17:14
  • @ErnestsKarlsons, Sorry that I didn't notice your output carefully – Rifat Dec 06 '11 at 17:17
  • 1
    On Mac OS X 10.6.8 with iconv library v1.11, both `//TRANSLIT//IGNORE` and `//IGNORE//TRANSLIT` result in no output for the Greek and Cyrillic examples above. Not even with `setlocale(LC_ALL, 'el_GR.UTF-8');` neither with `setlocale(LC_ALL, 'ru_RU.UTF-8');` Both `el_GR.UTF-8` and `ru_RU.UTF-8` are verified to exists on the OS with `$ locale -a`. – Pro Backup Jan 06 '12 at 22:41
  • Error on `echo "ΒΓΔΕΖΗΘΙΚΛΜΝΞΠΡΣΤΥΦΧΨΩ" π | iconv -f utf-8 -t ASCII//IGNORE//TRANSLIT` ... can be corrected? – Peter Krauss Aug 09 '19 at 10:02
  • This will just ignore the cryllic letters – Adam Apr 22 '20 at 20:42