12

I am using PHP to import a CSV file, which originates from an excel spreadsheet. Some of the fields contain line breaks in them, so when I reopen the csv in excel / open office spreadsheet, it misinterprets where the line breaks should happen.

Also in my script, using fgetcsv to go through each line, it is incorrectly line breaking where it shouldn't be.

I could manually cleanse the data but a) that would take ages as its a 10k line file, and b) the data is exported from a clients existing piece of software

Any ideas on how to automatically solve this on the import process? I would have thought delimiting the fields would have sorted it but it does not.

Horse
  • 3,023
  • 5
  • 38
  • 65
  • 1
    as long the fields with line breaks are quoted `"foo\nbar"` the column alignments shouldn't be affected – xzyfer Mar 29 '11 at 10:22
  • 1
    LF should be encoded as `\n` in CSV fields. You will need a custom reader to undo that export damage. Look through http://php.net/fgetcsv, but you will probably have to search or write one. – mario Mar 29 '11 at 10:23
  • 2
    possible duplicate (http://stackoverflow.com/questions/3278375/how-can-you-parse-excel-csv-data-that-contains-linebreaks-in-the-data ) – xzyfer Mar 29 '11 at 10:24
  • xzyfer, that solution wont work as I need the ability to start at a particular line from the file pointer, as loading an entire 15mb csv into a string on a production server is not going to work :( thanks anyways though – Horse Mar 29 '11 at 11:14

6 Answers6

20

The accepted answer didn't solve the problem for me, but I eventually found this CSV parser library on google code that works well for multiline fields in CSV's.

parsecsv-for-php:
https://github.com/parsecsv/parsecsv-for-php


For historical purposes, the original project home was:
http://code.google.com/p/parsecsv-for-php/

musicin3d
  • 1,028
  • 1
  • 12
  • 22
danieltalsky
  • 7,752
  • 5
  • 39
  • 60
  • 2
    +1 That is so far the best parser around! better than str_getcsv or alike. My data contained HTML and new-lines within fields. This is the only parser that got it right! Thanks! – lepe Sep 06 '13 at 07:26
  • 2
    It's kind of sad that it's so hard to find an accurate parser for one of the most common formats out there for one of the most widely used programming languages. I guess it's partially since there's no official standard, but line breaks inside quoted values is DEFINITELY something a CSV parser should be able to handle. – danieltalsky Sep 06 '13 at 12:31
  • 1
    This should be the approved answer. Thank you – Seb Barre Jun 27 '17 at 19:39
  • I searched for weeks and this answer brought my search to an end. This still works in 2021! – Professorval Jul 24 '21 at 00:14
3

It's an old thread but i encountered this problem and i solved it with a regex so you can avoid a library just for that. Here the code is in PHP but it can be adapted to other language.

$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1"$2 $3"', $parsedCSV);

This solutions supposes the fields containing a linebreak are enclosed by double quotes, which seems to be a valid assumption, at least for what i have seen so far. Also, the double quotes should follow a , or be placed at the start of a new line (or first line).

Example:

field1,"field2-part1\nfield2-part2",field3

Here the \n is replaced by a whitespace so the result would be:

field1,"field2-part1 field2-part2",field3

The regex should handle multiple linebreaks as well.

This might not be efficient if the content is too large, but it can help for many cases and the idea can be reused, maybe optimized by doing this for smaller chunks (but you'd need to handle the cuts with fix-sized buffered).

V. Högman
  • 86
  • 5
3

I had that problem too and did not find an way to read the data correctly.

In my case it was an one-time-import, so i made an script that searched for all line-breaks within an column and replaced it with something like #####. Then I imported the data and replaced that by linebreaks.

If you need an regular import you could write you own CSV-Parser, that handles the problem. If the text-columns are within "" you could treat everything between two "" as one columns (with check for escaped " within the content).

MacGucky
  • 2,494
  • 17
  • 17
1

My solution is the following:

nl2br(string);

http://php.net/manual/en/function.nl2br.php

Once you get to the individual cell (string) level, run it on the string and it will convert the linebreaks to html breaks for you.

Bart
  • 19,692
  • 7
  • 68
  • 77
Mike Wilding
  • 49
  • 1
  • 5
1

Although it is old question the answer might be still relevant to ppl. There is currently new library (framework independent) http://csv.thephpleague.com/ which supports NL chars in fields as well as some filtering.

ghispi
  • 46
  • 5
-1

Yes you needs to find that comma and replace by some special characters like combination of {()} and finally replace them with , that you are originally looking for.

Hope that helps you.

Aditya P Bhatt
  • 21,431
  • 18
  • 85
  • 104
  • 5
    sorry but this does not help at all, please actually read the full question before answering – Horse Mar 29 '11 at 11:15