I ran into bugs while importing a csv file in our database. the csv file contained lines with a big text field and upon inspection sometimes quoted text ("like this") was present in the fields.
So i wrote this little test function
function testcsv($fields, $delimiter = ";", $enclosure = '"', $escape_char = "\"")
{
$buffer = fopen('php://temp', 'r+');
$buffer2 = fopen('php://output', 'w');
fputcsv($buffer, $fields, $delimiter, $enclosure, $escape_char);
print("HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:");
fputcsv($buffer2, $fields, $delimiter, $enclosure, $escape_char);
print("\n\n");
rewind($buffer);
$array = fgetcsv($buffer, 0, $delimiter, $enclosure, $escape_char);
fclose($buffer);
fclose($buffer2);
print("equals?". ($fields == $array ? "yes" : "no"). "\n");
print( "original:\n");
print_r($fields);
print( "read value:\n");
print_r($array);
}
here is the transcript of me running the code in tinker (Laravels command line interface for PHP):
>>> testcsv(["a","b","c"])
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;b;c
equals?yes
original:
Array
(
[0] => a
[1] => b
[2] => c
)
read value:
Array
(
[0] => a
[1] => b
[2] => c
)
=> null
>>> testcsv(["a","a \"quoted\" string","c"])
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;"a "quoted" string";c
equals?no
original:
Array
(
[0] => a
[1] => a "quoted" string
[2] => c
)
read value:
Array
(
[0] => a
[1] => a quoted" string"
[2] => c
)
=> null
>>> testcsv(["a","a \"quoted\" string followed by the separator ; somewhere in the text","c"])
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;"a "quoted" string followed by the separator ; somewhere in the text";c
equals?no
original:
Array
(
[0] => a
[1] => a "quoted" string followed by the separator ; somewhere in the text
[2] => c
)
read value:
Array
(
[0] => a
[1] => a quoted" string followed by the separator
[2] => somewhere in the text"
[3] => c
)
=> null
As you can see the parsed array is not identical to the original array once quotes are used. So, I wondered, is there a newer function in PHP 8 or is this just never fixed?
(this because the same issue was already on stack overflow 7 years ago see fgetcsv/fputcsv $escape parameter fundamentally broken)
edit: clarification about what is actually broken here.
if you supply "" or "\\" (empty string or backslash) as parameter to $escape_char the csv parsing will actually work and read back its own input. However: the escaping will still be double quotes for quotes:
HOW THIS ARRAY IS REPRESENTED IN A CSV FILE:a;"a ""quoted"" string followed by the separator ; somewhere in the text and here is an escape char \ but does it respect that?";c
equals?yes
original:
Array
(
[0] => a
[1] => a "quoted" string followed by the separator ; somewhere in the text and here is an escape char \ but does it respect that?
[2] => c
)
read value:
Array
(
[0] => a
[1] => a "quoted" string followed by the separator ; somewhere in the text and here is an escape char \ but does it respect that?
[2] => c
)
so the contents of $escape_char is irrelevant and if you supply it with the actual used $escape_char """ it will break the csv parsing.