Some CSV files that we import to our server cannot be parsed correctly.
We are reading the CSV file with PHP's fgetcsv():
while (($line = fgetcsv($file)) !== false) { ... }
However, when the CSV line is wrapped in quotes (and contains two double quotes inside), for example:
"first entry,"""","""",Data Chunk,2022-05-30"
The fgetcsv() function cannot handle the line correctly and sees the first entry,"""","""",Data Chunk,2022-05-30
as one entry.
How can we make sure the function does regard first entry
as a separate entry, and also interpretes the other parts """"
as empty entries?
On more research I found:
Fields containing double quotes ("), Line Break (CRLF) and Comma must be enclosed with double quotes.
If Fields enclosed by double quotes (") contain double quotes character then the double quotes inside the field must be preceded with another double quote as an escape sequence. Source
This is likely the issue that we face here.
A more complete data example of the CSV:
Allgemeines
Subject,Body,Attachment,Author,Created At,Updated At
"Hello everyone, this is a sample. Kind regards,"""","""",Author name (X),2022-05-30 14:54:32 UTC,2022-05-30 14:54:37 UTC"
","""",https://padlet-uploads.storage.googleapis.com/456456456/testfile.docx,Author name (X),2022-05-15 13:53:04 UTC,2022-05-15 13:54:40 UTC"
",""Hello everyone!"
This is some fun text.
More to come.
Another sentence.
And more text.
Even more text
See you soon.
","",Author name (X),2021-07-22 09:41:06 UTC,2021-07-23 16:12:42 UTC
""
Important Things to Know in 2022
Subject,Body,Attachment,Author,Created At,Updated At
"","
01.01.2022 First day of new year
02.02.2202 Second day of new year
Please plan ahead.
","",Author name (X),2021-07-22 09:58:19 UTC,2022-03-24 14:16:50 UTC
""
Note: Line starts with double quote and ends with double quote and carriage return and new line feed.