0

My below function sanitize uploaded files:

public static function slugify($string) {
        $string = transliterator_transliterate("Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();", $string);
        $string = preg_replace('/[-\s]+/', '-', $string);
        return trim($string, '-');
    }

Here I have [:Punctuation:] to remove puctuations. The problem is that I want to keep dot(.) in my file names, because when I remove it, slugify turns 1.zip to 1zip. Is there a way to keep dot with this function?

Alireza
  • 6,497
  • 13
  • 59
  • 132

1 Answers1

0

You will need to supply a list of accepted characters instead. This:

$trans = Transliterator::create( "Latin; NFKD; [^\u0041-\u007A\u0020\u0027\u002D\002E] Remove; NFC" );

will remove everything except Latin alphabet characters and [ '-.]

You may need to tweak to your req. The codes you will need you use are UTF-16. e.g. '0x002E' is '.'

Kohjah Breese
  • 4,008
  • 6
  • 32
  • 48
  • So I should find corresponding utf-16 characters for punctuations? Is that right? – Alireza Aug 24 '14 at 09:54
  • As in the above example the square brackets will take a list or range (\u0041-\u007A) of UTF-16 notation chars. Anything that does not match that will be removed. Since I do not know exactly which ones you want to use I cannot offer a specific snippet. I use the character table in Ubuntu to get the codes. \002E is full stop. – Kohjah Breese Aug 24 '14 at 19:56
  • There's an error in your example : change `\002E` to `\u002E` – didier2l Nov 13 '17 at 16:01