I am having trouble correctly parsing a CSV file. Some of the values in the data rows can be blank, and my code does not work correctly when I have blank entries in any of the value rows. Without blank entries, the program returns the following results:
Symbol: GOOG
Name: Googl Inc.
Price: $570.25
High Today: $570.25
Low Today: $560.35
Symbol: APPL
Name: Apple Inc.
Price: $123.25
High Today: $124.25
Low Today: $125.35
If I run the same program with the following CSV string the program stops with an assertion error. This is due to the parser skipping over adjacent ,, delimiters and as a result the number of colums in the data row does not match that from the header.
std::stringstream ifs(
"Symbol,Name,Price,High Today,Low Today\n"
"GOOG,Googl Inc.,$570.25 ,$570.25 ,$560.35\n"
"APPL,Apple Inc.,$123.25 ,,$125.35\n");
Here is my code:
#include <iostream>
#include <vector>
#include <sstream>
#include <fstream>
#include <algorithm>
#include <cassert>
#include <locale>
// This ctype facet classifies commas and endlines as whitespace
struct csv_whitespace : std::ctype<char> {
static const mask* make_table() {
// make a copy of the "C" locale table
static std::vector<mask> v(classic_table(), classic_table() + table_size);
v[','] |= space; // comma will be classified as whitespace
v[' '] &= ~space; // space will not be classified as whitespace
return &v[0];
}
csv_whitespace(std::size_t refs = 0)
: ctype(make_table(), false, refs)
{}
};
static int row_end = std::ios_base::xalloc();
std::istream& record(std::istream& is) {
while (std::isspace(is.peek(), is.getloc())) {
int c(is.peek());
is.ignore();
if (c == '\n') {
is.iword(row_end) = 1;
is.setstate(std::ios_base::failbit);
}
}
return is;
}
template<class Iter1, class Iter2, class Function>
void for_each_binary_range(Iter1 first1, Iter1 last1,
Iter2 first2, Iter2 last2, Function f)
{
assert(std::distance(first1, last1) <=
std::distance(first2, last2));
while (first1 != last1) {
f(*first1++, *first2++);
}
}
int main(int argc, char *argv[])
{
std::stringstream ifs(
"Symbol,Name,Price,High Today,Low Today\n"
"GOOG,Googl Inc.,$570.25 ,$570.25 ,$560.35\n"
"APPL,Apple Inc.,$123.25 ,$124.25 ,$125.35\n");
//std::ifstream ifs("c:\\temp\\csvfile.csv", std::ios::in);
std::vector<std::string> keys, values;
ifs.imbue(std::locale(ifs.getloc(), new csv_whitespace));
bool bHeaderProcessed = false;
for (std::string item;;) {
if (ifs >> record >> item) {
if (!bHeaderProcessed) {
keys.push_back(item);
} else {
values.push_back(item);
}
} else if (ifs.eof()) {
// catch case where last line does not have trailing \n
if (!values.empty()) {
for_each_binary_range(std::begin(keys), std::end(keys),
std::begin(values), std::end(values),
[&](std::string const& key, std::string const& value) {
std::cout << key << ": " << value << std::endl;
std::cout << std::endl;
});
values.clear();
}
break;
} else if (ifs.iword(row_end)) {
// reset eol flag & clear stream state
ifs.iword(row_end) = 0;
// clear the fail-bit so we can stream more values
ifs.clear();
bHeaderProcessed = true;
if (!values.empty()) {
for_each_binary_range(std::begin(keys), std::end(keys),
std::begin(values), std::end(values),
[&](std::string const& key, std::string const& value) {
std::cout << key << ": " << value << std::endl;
});
values.clear();
std::cout << std::endl;
}
} else {
break;
}
}
return -1;
}
The original code which I based mine on is documented well here. Unfortunately, the answer to the question (with a live demo here) does not seem to handle the case where there are multiple rows and I cannot get it to handle the case where the tokens are empty.
My version prints out each of the rows as a series of name/values and it also handles the case where there are multiple rows or a row not ending on a new line.
The logic is described very well in linked answer above
Could someone point out how to handle the case where I have adjacent delimiters in the data lines in the csv.