12

Given a CSV file that has newline/return characters in certain fields, how do we parse the data without splitting a field into multiple rows.

Example CSV Data:

ID;Name;Country;ISO-2;Address;Latitude;Longitude
022wje3;Europa;Italy;IT;"Viale Kennedy 3
34073 Grado";"45,67960";"13,40070"
024oua5;Hiberia;Italy;IT;"Via XXIV Maggio 8
00187 Rome";"41,89720";"12,48680"
028gupn;Regal Riverside;Hong Kong;HK;"34-36 Tai Chung Kiu Road
Shatin
Hong Kong";"22,38260";"114,19600"
02j7qry;Okaliptus Holiday Villas Apart;Turkey;TR;"Sevket Sabanci Caddesi No. 70
Bahçelievler Mevkii
Turgutreis";"37,02130";"27,25120"
02pc99z;California Apartementos;Spain;ES;"Prat d'en Carbó
43840 Salou";"41,07620";"1,14667"
02tu1jz;Elvis Presley's Heartbreak;United States;US;"3677 Elvis Presley Blvd.
Memphis
Tennessee 38116";"35,04850";"-90,02710"

Note: fields are actually separated by semicolon ; because Address can contain commas

Each row has 7 fields but we don't want to mistakenly parse data in a field containing newline characters as multiple rows...

We found a few Perl-focussed answers on StackOverflow:

but I'm a bit rusty on Perl and have not discovered a JS-focussed answer.

Community
  • 1
  • 1
nelsonic
  • 31,111
  • 21
  • 89
  • 120
  • 1
    Newlines wrapped in double quotes are not the end of record. Did you get that from any of those Perl links? And, if you don't understand how that Ben Nadel did it, you won't know how to do a regex for non-standard cases. It's better to search SO for csv parsing regexes. I've posted quite a few myself. –  Mar 29 '16 at 16:45
  • Exactly. new lines which are inside the double quotes are still the same row/field. I think I've understood the `CSVToArray` and it solved our immediate problem. I will read through a few of your RegEx answers to brush up. Thanks! – nelsonic Mar 29 '16 at 17:13

2 Answers2

25

Have you tried CSVToArray by Ben Nadel?

/**
 * CSVToArray parses any String of Data including '\r' '\n' characters,
 * and returns an array with the rows of data.
 * @param {String} CSV_string - the CSV string you need to parse
 * @param {String} delimiter - the delimeter used to separate fields of data
 * @returns {Array} rows - rows of CSV where first row are column headers
 */
function CSVToArray (CSV_string, delimiter) {
   delimiter = (delimiter || ","); // user-supplied delimeter or default comma

   var pattern = new RegExp( // regular expression to parse the CSV values.
     ( // Delimiters:
       "(\\" + delimiter + "|\\r?\\n|\\r|^)" +
       // Quoted fields.
       "(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +
       // Standard fields.
       "([^\"\\" + delimiter + "\\r\\n]*))"
     ), "gi"
   );

   var rows = [[]];  // array to hold our data. First row is column headers.
   // array to hold our individual pattern matching groups:
   var matches = false; // false if we don't find any matches
   // Loop until we no longer find a regular expression match
   while (matches = pattern.exec( CSV_string )) {
       var matched_delimiter = matches[1]; // Get the matched delimiter
       // Check if the delimiter has a length (and is not the start of string)
       // and if it matches field delimiter. If not, it is a row delimiter.
       if (matched_delimiter.length && matched_delimiter !== delimiter) {
         // Since this is a new row of data, add an empty row to the array.
         rows.push( [] );
       }
       var matched_value;
       // Once we have eliminated the delimiter, check to see
       // what kind of value was captured (quoted or unquoted):
       if (matches[2]) { // found quoted value. unescape any double quotes.
        matched_value = matches[2].replace(
          new RegExp( "\"\"", "g" ), "\""
        );
       } else { // found a non-quoted value
         matched_value = matches[3];
       }
       // Now that we have our value string, let's add
       // it to the data array.
       rows[rows.length - 1].push(matched_value);
   }
   return rows; // Return the parsed data Array
}

in your case invoke it with:

var rows = CSVToArray(CSV_string, ';');

where CSV_string is your string of CSV data.

iteles
  • 861
  • 1
  • 7
  • 15
0

A bit late but I hope it helps someone.

Some time ago even I faced a similar problem and I had used a library csvtojson in my angular project.

You can read the CSV file as a string using the following code and then pass that string to the csvtojson library and it will give you a list of JSON.

Sample Code:

const csv = require('csvtojson');
if (files && files.length > 0) {
      const file: File = files.item(0);
      const reader: FileReader = new FileReader();
      reader.readAsText(file);
      reader.onload = (e) => {
        const csvs: string = reader.result as string;
        csv({
          output: "json",
          noheader: false
        }).fromString(csvs)
          .preFileLine((fileLine, idx) => {
            //Convert csv header row to lowercase before parse csv file to json
            if (idx === 0) { return fileLine.toLowerCase() }
            return fileLine;
          })
          .then((result) => {
            // list of json in result
          });
      }
    }
  }
bhavya_karia
  • 760
  • 1
  • 6
  • 12