0

I ran into bugs while importing a csv file in our database. the csv file contained lines with a big text field and upon inspection sometimes quoted text ("like this") was present in the fields.

So i wrote this little test function

function testcsv($fields, $delimiter = ";", $enclosure = '"', $escape_char = "\"")
{
    $buffer = fopen('php://temp', 'r+');
    $buffer2 = fopen('php://output', 'w');
    fputcsv($buffer, $fields, $delimiter, $enclosure, $escape_char);
    print("HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:");
    fputcsv($buffer2, $fields, $delimiter, $enclosure, $escape_char);
    print("\n\n");
    rewind($buffer);
    $array = fgetcsv($buffer, 0, $delimiter, $enclosure, $escape_char);
    fclose($buffer);
    fclose($buffer2);
    print("equals?". ($fields == $array ? "yes" : "no"). "\n");
    print( "original:\n");
    print_r($fields);
    print( "read value:\n");
    print_r($array);

}

here is the transcript of me running the code in tinker (Laravels command line interface for PHP):

>>> testcsv(["a","b","c"])
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;b;c


equals?yes
original:
Array
(
    [0] => a
    [1] => b
    [2] => c
)
read value:
Array
(
    [0] => a
    [1] => b
    [2] => c
)
=> null
>>> testcsv(["a","a \"quoted\" string","c"])
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;"a "quoted" string";c


equals?no
original:
Array
(
    [0] => a
    [1] => a "quoted" string
    [2] => c
)
read value:
Array
(
    [0] => a
    [1] => a quoted" string"
    [2] => c
)
=> null
>>> testcsv(["a","a \"quoted\" string followed by the separator ; somewhere in the text","c"])
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;"a "quoted" string followed by the separator ; somewhere in the text";c


equals?no
original:
Array
(
    [0] => a
    [1] => a "quoted" string followed by the separator ; somewhere in the text
    [2] => c
)
read value:
Array
(
    [0] => a
    [1] => a quoted" string followed by the separator 
    [2] =>  somewhere in the text"
    [3] => c
)
=> null

As you can see the parsed array is not identical to the original array once quotes are used. So, I wondered, is there a newer function in PHP 8 or is this just never fixed?

(this because the same issue was already on stack overflow 7 years ago see fgetcsv/fputcsv $escape parameter fundamentally broken)

edit: clarification about what is actually broken here.

if you supply "" or "\\" (empty string or backslash) as parameter to $escape_char the csv parsing will actually work and read back its own input. However: the escaping will still be double quotes for quotes:

HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;"a ""quoted"" string followed by the separator ; somewhere in the text and here is an escape char \ but does it respect that?";c


equals?yes
original:
Array
(
    [0] => a
    [1] => a "quoted" string followed by the separator ; somewhere in the text and here is an escape char \ but does it respect that?
    [2] => c
)
read value:
Array
(
    [0] => a
    [1] => a "quoted" string followed by the separator ; somewhere in the text and here is an escape char \ but does it respect that?
    [2] => c
)

so the contents of $escape_char is irrelevant and if you supply it with the actual used $escape_char """ it will break the csv parsing.

Jan
  • 1,582
  • 1
  • 13
  • 19
  • 1
    Using `$escape = '"'` doesn't work, but it works with the default escape. – Barmar May 23 '22 at 15:40
  • Can you elaborate what you mean with `$enclosure = '"', $escape_char = "\""`? That looks like a typo. – Ulrich Eckhardt May 23 '22 at 16:33
  • if you use an empty escape "" or a backslash "\\" it at least can parse its own output, however it will still escape quotes and quotes only (using double quotes). so the $escape_char parameter seems to not work and if you actually specify the char it uses ("\""), it breaks. – Jan May 24 '22 at 07:19

0 Answers0