1

I'm trying to find the indexes for certain header values in a CSV file so I can then use them to extract the data in those positions in the rest of the file. I'm adding the header values into a map<std::string, int> so I can retain the indexes.

I had working code until I noticed that if a header is the last value in the row it doesn't match. The last header string is empty inside my nested loop but not in the outer loop.

const int columnCount = 2;
std::string columns[columnCount] = { "column1", "column2" };

map<std::string, int> columnMap;

std::vector<std::string> cols(columns, columns + columnCount);
std::vector<std::string> cells;

boost::tokenizer<boost::escaped_list_separator<char> > tok(header_row);
cells.assign(tok.begin(), tok.end());

std::vector<std::string>::iterator iter_cells;
std::vector<std::string>::iterator iter_cols;

for (iter_cells = cells.begin(); iter_cells != cells.end(); ++iter_cells) {
    std::string cell = *iter_cells; 
    for(iter_cols = cols.begin(); iter_cols != cols.end(); ++iter_cols) {
        std::string col = *iter_cols;
        cout << cell << "=" << col;
        if(col.compare(cell) == 0) {
            cout << " MATCH" << endl;
            columnMap.insert(std::make_pair(*iter_cols,iter_cells-cells.begin()));
            break;
        }
        cout << endl;
    }
}

Where the tok(row) is the equivalent of tok("column0,column1,column2") I get this output;

column0=column1
column0=column2
column1=column1 MATCH
=column1
=column2

Whereas if it's tok("column0,column1,column2,column3") I get;

column0=column1
column0=column2
column1=column1 MATCH
column2=column1
column2=column2 MATCH
=column1
=column2

When I cout << cell in the outer loop the value is shown correctly.

Why do I loose the value of cell in the inner loop?

EDIT

Code in github and test files is compiled with;

gcc parse_csv.cpp -o parse_csv -lboost_filesystem -lmysqlpp

and executed with

./parse_csv /home/dave/SO_Q/

I get this output;

Process File: /home/dave/SO_Q/test_2.csv
metTime
metTime=metTime MATCH
Ta
=metTime
=Ta
=Ua
=Th
Process File: /home/dave/SO_Q/test_1.csv
DATE_TIME_UTC
DATE_TIME_UTC=metTime
DATE_TIME_UTC=Ta
DATE_TIME_UTC=Ua
DATE_TIME_UTC=Th
Ta
Ta=metTime
Ta=Ta MATCH
metTime
=metTime
=TaTime
=UaTime
=ThTime
Dave Anderson
  • 11,836
  • 3
  • 58
  • 79
  • As Nik demonstrates, this code should work. I can't see anything wrong with it either. So the problem is in something that is missing from your question. Do you think you could post a *complete* program with this problem, preferably as small as possible. – john Nov 22 '13 at 07:24
  • @john, Thanks, I think I've added a more complete version of the code and the two test files I'm using here; https://github.com/davecanderson/stackoverflow_20138787 – Dave Anderson Nov 22 '13 at 23:06

2 Answers2

0

Not sure how you are populating variable "header_row" but below code works for me I get this output

column0=column1

column0=column2

column1=column1 MATCH

column2=column1

column2=column2 MATCH

column3=column1

column3=column2

#include  <boost/tokenizer.hpp>
#include <iostream>
#include <fstream>
#include <map>

using namespace std;



int main()
{
  //create csv
  ofstream csvFile ("data.csv");
  std::string row = "column0,column1,column2,column3";

  csvFile << row;

  csvFile.close();

  const int columnCount = 2;
  std::string columns[columnCount] = { "column1", "column2" };

  map<std::string, int> columnMap;

  std::vector<std::string> cols(columns, columns + columnCount);
  std::vector<std::string> cells;

  //open csv file
  std::string header_row;
  ifstream csvRead("data.csv");
  assert(csvRead.is_open());
  getline(csvRead,header_row);

  boost::tokenizer<boost::escaped_list_separator<char> > tok(header_row);
  cells.assign(tok.begin(), tok.end());
  //close file
  csvRead.close();

  std::vector<std::string>::iterator iter_cells;
  std::vector<std::string>::iterator iter_cols;
  
 //original loops as provided in question
  for (iter_cells = cells.begin(); iter_cells != cells.end(); ++iter_cells) {
    std::string cell = *iter_cells;
    for(iter_cols = cols.begin(); iter_cols != cols.end(); ++iter_cols) {
      std::string col = *iter_cols;
      cout << cell << "=" << col;
      if(col.compare(cell) == 0) {
        cout << " MATCH" << endl;
        columnMap.insert(std::make_pair(*iter_cols,iter_cells-cells.begin()));
        break;
      }
      cout << endl;
    }
  }

}
Community
  • 1
  • 1
Nik
  • 1,294
  • 10
  • 16
  • This is not really an answer, it should be a comment. You haven't explained why the OP's code is not working. – john Nov 22 '13 at 07:22
  • @john Point taken. There is no issue in the way loops are run. Problem probably lies in how "header_row" is filled. – Nik Nov 22 '13 at 07:27
  • Thanks @Nik, the header comes in by reading the first line of the CSV file using `std::string header_row; std::ifstream data(filename.c_str()); std::getline(data, header_row);` I've posted the URL to a github repository in the question comments with some test data and the code I have. – Dave Anderson Nov 22 '13 at 23:10
0

The problem was with the input for the header line. This contained a line break at the end which wasn't matching the items in the array. Removing the line break fixed the problem.

I was working on a Windows PC and then transferring the file to a Cent OS machine for running the code and the difference in the line endings between the two platforms is what caused the issue.

Using this as a debug statement cout << cell would show the string and ignore the line break. Using something like cout << cell << " CELL" didn't show the string because of the line break.

I have now added this in my code to catch the difference in line breaks

// remove possible windows line ending so last item matches
if(header_row[header_row.length()-1] == '\r') {
    header_row.erase(header_row.size()-1);
}
Dave Anderson
  • 11,836
  • 3
  • 58
  • 79