14

Using PHP 5.3 fgetcsv function, I am experiencing some problems due to encoding matters. Note that that file has spanish "special" latin characters like graphic accents á, é, í ï, etc...

I get the CSV file exporting some structured data I have in an MS 2008 for Mac Excel file.

If I open it with Mac OS X TextEdit application, everything seems to go perfect.

But when I get down to my PHP program and try to read the CSV using that fgetcsv PHP function, I am not getting it to read properly the charset.

/**
 * @Route("/cvsLoad", name="_csv_load")
 * @Template()
 */
public function cvsLoadAction(){
    //setlocale(LC_ALL, 'es_ES.UTF-8');
    $reader = new Reader($this->get('kernel')->getRootDir().'/../web/uploads/documents/question_images/2/41/masiva.csv');

    $i = 1;
    $r = array("hhh" => $reader -> getAll());

    return new Response(json_encode($r, 200));
}

As you can see, I have tried also to use a setlocale to es_ES.UTF-8. But nothing get it working.

The read part comes here:

public function getRow()
{
    if (($row = fgetcsv($this->_handle, 10000, $this->_delimiter)) !== false) {
        $this->_line++;
        return $this->_headers ? array_combine($this->_headers, $row) : $row;
    } else {
        return false;
    }
}

See what I get in the $row variable after each row reading:

enter image description here

Those ? characters are supposed to be vowels with graphic accents on them.

Any clue over there? Would it work if I used MS Excel for Windows? How can I know in run time the exact encoding of the file and set it before reading it?

(For those spanish speakers, don't get frightened with such awful medical stuff in those texts ;)).

ElPiter
  • 4,046
  • 9
  • 51
  • 80
  • 1
    Same problem. A UTF8 encoded CSV file imports fine on one server but not the other. Ended up writing my own CSV reader. – Salman A Nov 08 '12 at 21:49
  • FWIW, you can't really *know* the encoding of a file without being told. You can guess when you read it, and convert accordingly, but nothing is as reliable as being told the encoding. – cmbuckley Nov 08 '12 at 22:06
  • Thanks cbuckley. What do you mean with "convert accordingly"sort of try to guess it and ask the user if he approves the import? And if not, keep trying other encodings for the origin? – ElPiter Nov 08 '12 at 22:55

2 Answers2

39

Try this:

function convert( $str ) {
    return iconv( "Windows-1252", "UTF-8", $str );
}

public function getRow()
{
    if (($row = fgetcsv($this->_handle, 10000, $this->_delimiter)) !== false) {
        $row = array_map( "convert", $row );
        $this->_line++;
        return $this->_headers ? array_combine($this->_headers, $row) : $row;
    } else {
        return false;
    }
}
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • 2
    +1. Note on the [docs](http://php.net/manual/en/function.fgetcsv.php): if the file is single-byte encoded (such as CP1252), but the locale is multibyte, then fgetcsv does not work as expected. – cmbuckley Nov 08 '12 at 22:07
  • This was the one!! :) Thanks a lot. Just some comments: first, need to declare as static the function convert and map it in array_map as 'self:convert'; second, in my case it was iconv("macintosh", "UTF-8", $str), as MS Excel for Mac OS exports to CSV using Mac OS Roman. Finally, although this is a great answer and really helped me, still is not all what solves my needs, as I won't know if my users will upload a file from a Mac or a PC or whatever... any other clues on how to detect which is the encoding of the uploaded file? Thanks again!! – ElPiter Nov 08 '12 at 22:53
  • You need to use heuristics .. first see if it is valid UTF-8 or UTF-16, if it's not, determine PC/MAC (From user agent header) and use Windows-1252 for PC and Macintosh for mac. Of course if the user is not using latin script, you would use Windows-1251 (Cyrillic for Windows) and so on. Someone must have written a library for this :D – Esailija Nov 08 '12 at 23:00
  • @ElPiter If you can expect your users to use chrome or firefox with javascript enabled, you can access the file's contents in javascript. You could then represent a dropdown with different encodings and let the user pick the one that shows characters correctly, and send that info along with the file when uploading – Esailija Nov 08 '12 at 23:10
  • This worked for me. But I suspect that there is a better way (to set the encoding before fopen() is called). So I'm not going to up-vote. – Dalin Jan 24 '14 at 15:57
  • Awesome. I was scratching my head since yesterday. – Kiran Dash Feb 23 '17 at 08:36
0

This is likely to do with the way excel encodes the file when saving.

Try uploading the .xls file to google docs and downloading as a .csv

joedixon
  • 69
  • 2
  • I also tried, but it goes even worse I think. Nevertheless, I'll try again. Thanks for the fast response. :) – ElPiter Nov 08 '12 at 21:48