3

I have a following string as a Filename

$string = 'recyclage plétre francin.jpg';

and tried with following code

echo preg_replace('/[^a-z0-9|^.]/i', '_', iconv("UTF-8","ISO-8859-1//TRANSLIT",$string));

as there is a special (non-ascii) character in filename it creates junk character while working with file uploading using PHP.

What I want is that replace any unicode (non-ascii) character with specific Ascii character. I want to keep all supported Ascii characters and remove non-ascii characters. I also want to keep / or \ slashes because of directory separators in filename where a root path will be given.

Edit: (below is not solved)

I am having a issue with recyclage plƒtre francin.JPG please the f character which displays output like recyclage pl and it had truncated .JPG. Actually file name was recyclage plâtre francin and when I was debugging it has shown recyclage plƒtre francin.JPG and rest is written just after that. Any Idea?

When I am trying to convert tri et recyclage du plâtre but when at the reading it shows tri et recyclage du plâtre and after conversion it shows tri et recyclage du pl^atre.

Any help will be appreciated.

Smile
  • 2,770
  • 4
  • 35
  • 57

4 Answers4

6

If you use the TRANSLIT modifier, it replaces all characters which can't be displayed in the target encoding. Since é can be represented in ISO-8859-1 it is encoded as ANSI-Code 0xE9.

I guess you want something like that:

$string = 'recyclage plétre francin.jpg';
echo iconv("UTF-8","ASCII//TRANSLIT",$string);

The result with that iconv-call is: recyclage pletre francin.jpg

vstm
  • 12,407
  • 1
  • 51
  • 47
  • I am having a issue with `recyclage plƒtre francin.JPG` please the `f` character which displays output like `recyclage pl` and it had truncated `.JPG`. Actually file name was `recyclage plâtre francin` and when I was debugging it has shown `recyclage plƒtre francin.JPG` and rest is written just after that. Any Idea? – Smile Jul 16 '13 at 05:55
  • When I am trying to convert `tri et recyclage du plâtre` but when at the reading it shows `tri et recyclage du plâtre` and after conversion it shows `tri et recyclage du pl^atre`. – Smile Jul 16 '13 at 06:02
  • Hmm apparently ["not all characters are decomposable"](http://stackoverflow.com/a/11867974/855532). Which means that some characters are translated into non-ASCII-characters. That means you could either use a regex to filter or map any unwanted characters (of course the "mapping" is probably a bigger/complicated task). – vstm Jul 16 '13 at 06:35
3

Here is a solution to my question. Finally I could able to see the conversion. Some Unicode characters are replaced with some Ascii characters. But after all everything is now working fine.

function toASCII($str)
{
    $accent   = 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûýýþÿŔŕƒ';
    $noaccent = 'SOZsozYYuaaaaaaaceeeeiiiidnoooooouuuuybsaaaaaaaceeeeiiiidnoooooouuuyybyRra';
    $string = strtr(utf8_decode($string),utf8_decode($accent),$noaccent);
    return strtr($string, $accent, $noaccent);
}
Smile
  • 2,770
  • 4
  • 35
  • 57
1
Check this code 

<?php

$string = 'recyclage plétre francin.jpg';
$str = preg_replace('/[^\x20-\x7E]/', '', $string);
echo $str;
?>
gayan
  • 212
  • 1
  • 3
0

You can use simple one that will remove all chars except a-z, 0-9 or whitespace.

// Remove all characters that are not the separator, a-z, 0-9, or whitespace
$string = preg_replace('![^'.preg_quote('-').'a-z0-_9\s]+!', '', strtolower($string));
// Replace all separator characters and whitespace by a single separator
$string = preg_replace('!['.preg_quote('-').'\s]+!u', '-', $string);
Goran Jakovljevic
  • 2,714
  • 1
  • 31
  • 27