I am building a CSV export tool. It's been working well for months, but we've recently encountered a couple of cases where there are extra lines caused by a cell "exploding".
I've narrowed the problem down to a bug in PHP (7.2.21 and others). I need to work around this. Below is the PHP script to reproduce the problem. It includes commas and linebreaks in each cell:
<?php
$data = array(
array( 'ID', 'Name', 'Content', 'Date' ),
array( 34, 'Radley', 'This is <strong>bold text</strong>, and' . "\r\n" . 'a second line, the first time', '2019-08-23' ),
array( 47, 'John', 'This a <a href=\"http://example.org/\">link</a>' . "\r\n" . 'a second line, again', '2019-08-24' ),
array( 65, 'Bob', 'This plain text, with no html, and lots of commas'. "\r\n" . 'and a third extra row', '2019-08-25' ),
);
$fh = fopen('php://output', 'w');
foreach( $data as $row ) fputcsv( $fh, $row );
echo stream_get_contents( $fh );
fclose( $fh );
?>
If we print_r
the $data
we can see this is only four values:
[2] => Array (
(
[0] => 47
[1] => John
[2] => This a <a href=\"http://example.org/\">link</a>
a second line, again
[3] => 2019-08-24
)
)
So I would expect this code to generate four rows with four columns each, but the third item by "John" is only 3 columns and adds an extra row with three columns as well.
The problem is not just that \"
is already escaped. Having that in the string disables fputcsv from escaping EVERYTHING ELSE.
This malformed data can be seen in Excel for Windows and Mac, and also Google Sheets in Windows Chrome, and probably everything else. But if you load it back in to PHP with fgetcsv
it works, so PHP must do this intentionally?
I need to correct this bug so that \"
gets escaped properly into \""
, and so all other commas and double-quotes get escaped as well. The plugin I am building exports content generated by other plugins, and those might have escaped data that needs to stay escaped with their own mechanisms.
Is there any way to ensure that each cell gets escaped, even if the cell has backslashes and quotes?