5

I'm reading a CSV file in C++ and the row format is as such:

"Primary, Secondary, Third", "Primary", , "Secondary", 18, 4, 0, 0, 0

(notice the empty value)

When I do:

while (std::getline(ss, csvElement, ',')) {
   csvColumn.push_back(csvElement);
}

This splits up the first string into pieces which isn't correct.

How do I preserve the string when iterating? I tried to do a combination of the above and while also grabbing the lines separated by double quote but I got wild results.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
dimxasnewfrozen
  • 127
  • 1
  • 13
  • 1
    Does anyone have a good solution for this using `std::quoted`? The standard specifically mentions CSV files when describing `std::quoted` but I can't come up with an elegant way to use it. – user2093113 Feb 25 '16 at 22:34

3 Answers3

5

Using std::quoted allows you to read quoted strings from input streams.

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

int main() {
    std::stringstream ss;
    ss << "\"Primary, Secondary, Third\", \"Primary\", , \"Secondary\", 18, 4, 0, 0, 0";

    while (ss >> std::ws) {
        std::string csvElement;

        if (ss.peek() == '"') {
            ss >> std::quoted(csvElement);
            std::string discard;
            std::getline(ss, discard, ',');
        }
        else {
            std::getline(ss, csvElement, ',');
        }

        std::cout << csvElement << "\n";
    }
}

Live Example

The caveat is that quoted strings are only extracted if the first non-whitespace character of a value is a double-quote. Additionally, any characters after the quoted strings will be discarded up until the next comma.

user2093113
  • 3,230
  • 1
  • 14
  • 21
  • As per my comment on the question, I feel like there should be a good way to utilise `std::quoted` for CSV parsing. Let me know if you have any improvement upon this. – user2093113 Feb 25 '16 at 22:47
  • According to http://en.cppreference.com/w/cpp/io/manip/quoted, cin skips whitespace by default, and `quoted` extracts the first character via `stream >> c`, so it works even if there's whitespace between the comma and the opening quote. As such, you don't need the peek. – Mooing Duck Feb 25 '16 at 22:56
  • @MooingDuck If whitespace is skipped by default then always calling `std::quoted` will split un-quoted strings containing whitespace, potentially before the comma separator. I'm not sure if that will fail to parse typical CSV formats? Or maybe I have misunderstood your suggestion. – user2093113 Feb 25 '16 at 23:05
  • Yup, you're right, I'm wrong. Hmm. It's weird that it seems so inept at solving it's own task – Mooing Duck Feb 25 '16 at 23:08
  • @MooingDuck I agree, it feels like I'm missing something! I suppose it is handy for writing CSV files at least. – user2093113 Feb 25 '16 at 23:10
  • RFC4180-compliant CSV files double their double-quotes to escape them, so the call to `quoted` above should look like `std::quoted(csvElement, '"', '"')`. Also regarding the `discard` bit, according to that same RFC a field is either fully enclosed in double-quotes, or don't have double quote in it, so if you want to be strict and reject non-RFC-compliant CSV files you could just expect a comma after a quoted field. – adl Jul 18 '18 at 15:30
  • I believe instead of doing `std::getline` with the `discard` string you could do `ss.ignore(std::numeric_limits::max(), ',');` – Lia Stratopoulos Jul 18 '22 at 17:48
2

You need to interpret the comma depending on whether you're betwwen the quote or not. This is too complexfor getline().

The solution would be to read the full line with getline(), and parse the line by iterating through the string character by character, and maintaing an indicator whether you're between double quotes or not.

Here is a first "raw" example (double quotes are not removed in the fields and escape characters are not interpreted):

string line; 
while (std::getline(cin, line)) {        // read full line
    const char *mystart=line.c_str();    // prepare to parse the line - start is position of begin of field
    bool instring{false};                
    for (const char* p=mystart; *p; p++) {  // iterate through the string
        if (*p=='"')                        // toggle flag if we're btw double quote
            instring = !instring;     
        else if (*p==',' && !instring) {    // if comma OUTSIDE double quote
            csvColumn.push_back(string(mystart,p-mystart));  // keep the field
            mystart=p+1;                    // and start parsing next one
        }
    }
csvColumn.push_back(string(mystart));   // last field delimited by end of line instead of comma
}

Online demo

Christophe
  • 68,716
  • 7
  • 72
  • 138
0

How do I preserve the string when iterating?

Here is the C++ approach I have used.

I noticed that you have only 3 field types: string, null, and int.

The following approach uses these field types (in method "void init()"), in the order each each row presents the fields, sometimes using string::find() ( instead of getline() ) to locate field end.

Each of the 3 methods consumes characters from the string with erase. I know erase is slow, but I made this choice for my convenience. (erasing is easier to test, just add a cout after each extract). The erase's can be removed / replaced by appropriate handling (where needed) of start-of-search index.

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

#include <cassert>


class CSV_t
{
   typedef std::vector<int>  IntVec_t;

   // private nested class -- holds contents of 1 csv record
   class CSVRec_t
   {
   public:
      std::string primary;
      std::string secondary;
      std::string nullary;
      std::string thirdary;
      IntVec_t    i5;

      std::string show()
         {
            std::stringstream ss;
            ss <<            std::setw(25) << primary
               << "     " << std::setw(10) << secondary
               << "     " << std::setw(12)<< thirdary << "     ";

            for (size_t i=0;
                 i<i5.size(); ++i) ss << std::setw(5) << i5[i];

            ss << std::endl;
            return (ss.str());
         }

   }; // class CSVRec_t


   typedef std::vector<CSVRec_t> CSVRecVec_t;

   CSVRecVec_t csvRecVec;  // holds all csv record

public:

   CSV_t() { };

   void init(std::istream& ss)
      {
         do  // read all rows of file
         {
            CSVRec_t csvRec;

            std::string s;
            (void)std::getline(ss, s);

            if(0 == s.size()) break;

            assert(s.size()); extractQuotedField(s, csvRec.primary);   // 1st quoted substring
            assert(s.size()); extractQuotedField(s, csvRec.secondary); // 2nd quoted substring
            assert(s.size()); confirmEmptyField(s, csvRec.nullary);    // null field
            assert(s.size()); extractQuotedField(s, csvRec.thirdary);  // 3rd quoted substring
            assert(s.size()); extract5ints(s, csvRec.i5);              // handle 5 int fields

            csvRecVec.push_back(csvRec);  // capture

            if(ss.eof()) break;

         }while(1);
      }

   void show()
      {
         std::cout << std::endl;

         for (size_t i = 0; i < csvRecVec.size(); ++i)
            std::cout << std::setw(5) << i+1 << "   " << csvRecVec[i].show();

         std::cout << std::endl;
      }

private:

   void extractQuotedField(std::string& s, std::string& s2)
      {
         size_t indx1 = s.find('"', 0);
         assert(indx1 != std::string::npos);

         size_t indx2 = s.find('"', indx1+1);
         assert(indx2 != std::string::npos);

         size_t rng1 = indx2 - indx1 + 1;

         s2 = s.substr(indx1, rng1);

         s.erase(indx1, rng1+1);
      }

   void confirmEmptyField(std::string& s, std::string nullary)
      {
         size_t indx1 = s.find('"');

         nullary = s.substr(0, indx1);

         // tbd - confirm only spaces and comma's in this substr()

         s.erase(0, indx1); 
      }

   void extract5ints(std::string& s, IntVec_t& i5)
      {
         std::stringstream ss(s);

         int t = 0;
         for (int i=0; i<5; ++i)
         {
            ss >> t;
            ss.ignore(1); // skip ','
            assert(!ss.bad()); // confirm ok
            i5.push_back(t);
         }
         s.erase(0, std::string::npos);
      }

};  // class CSV_t



int t288(void) // test 288
{
   std::stringstream ss;
   ss << "\"Primary, Secondary, Third\", \"Primary\", , \"Secondary\", 18, 4, 0, 0, 0\n"
      << "\"Pramiry, Secandory, Thrid\", \"Pramiry\", , \"Secandory\", 19, 5, 1, 1, 1\n"
      << "\"Pri-mary, Sec-ondary, Trd\", \"Pri-mary\", , \"Sec-ondary\", 20, 6, 2, 3, 4\n"
      << std::endl;

   CSV_t csv;

   csv.init(ss);

   csv.show(); // results

   return (0);
}
2785528
  • 5,438
  • 2
  • 18
  • 20