0

I need to read a .csv file, put the information into a struct, then insert the struct into a binary file. Each column means:

(int)Year; (int)Rank; (char*)University's Name; (float)Score; (char*)City; (char*)Country 

My .csv file looks like:

2018,1,Harvard University,97.7,Cambridge,United States
2018,2,University of Cambridge,94.6,Cambridge,United Kingdom
2018,3,University of Oxford,94.6,Oxford,United Kingdom
2018,4,Massachusetts Institute of Technology (MIT),92.5,Cambridge,United States
2018,5,Johns Hopkins University,92.1,Baltimore,United States

As you can see, it contains the five best universities in the world.

The issue is my code can read neither the integers (perhaps due to commas in the file) nor the char[30].

struct Data
{
  int year;
  int rank;
  char name[30];
  float score;
  char city[30];
  char country[30];
};


`Data *universitie = new Data [5];
ifstream read ( "myFile.csv" );
int i = 0;

while ( !read.eof() ) {
  read >> universitie[i].year;
  read >> universitie[i].rank;
  read.getline (universitie[i].name, 30, ','); //here a segmentation fault happened
  read >> universitie[i].score;
  read.getline (universitie[i].city, 30, ','); //here a segmentation fault happened
  read.getline (universitie[i].country, 30, '\n'); //here a segmentation fault happened
  i++;
}
read.close ();

I'm actually using char[30] for name, city and country instead of string because then I'll write this struct array into a binary file.

How can I properly read the integers until the comma? How to read a character array from a file using getline() with delimeters like ,?

This is for my Computer Science course's task.

Dúthomhas
  • 8,200
  • 2
  • 17
  • 39
  • 1
    This question had been asked many times, just look around on SO a bit longer. For exmple [here](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c). And don't use `char[30]` for strings, use ['std::string'](https://en.cppreference.com/w/cpp/string/basic_string) – Pepijn Kramer Feb 04 '23 at 20:00
  • Every attempt to read input that consists of lines of text, irrespective of the actual format of each line, in any other way other than a single called to `std::getline`, to read the entire line into a `std::string` before parsing it any further -- any approach other than this one will always end in tears. – Sam Varshavchik Feb 04 '23 at 20:02
  • Is there _any_ flexibility in your `Data` struct? As it stands it cannot even hold the entire length of all the University names. – Dúthomhas Feb 05 '23 at 04:11
  • 1
    Longest country name in the world is “The United Kingdom of Great Britain and Northern Ireland” at 56 letters. Next is “Independent and Sovereign Republic of Kiribati” at 46. – Dúthomhas Feb 05 '23 at 04:21
  • LOL, “Knowsley Park Centre for Learning - serving Prescot, Whiston and the wider community” — and that’s in English. What about UTF-8-encoded names? Do you get input as “Universität Wuppertal” or ”University of Wuppertal”? What about “ETH Zürich”? That’s one of the top Universities in the world. – Dúthomhas Feb 05 '23 at 04:32
  • So... I was thinking about posting an answer to help you with the binary file format... and there really isn’t any advantage to storing as binary with this data. Does your task _require_ you to use a binary file format? (And, as before, is there any flexibility in the structure?) – Dúthomhas Feb 05 '23 at 04:39
  • For this task, storing into a binary file is required. Omegalul professor – Ítalo Alves Rabelo Feb 05 '23 at 17:49

2 Answers2

1

The usual technique with CSV files is to model a record with a class, then overload operator>> to input a record:

struct Record
{
  int year;
  int rank;
  std::string name;
  double      score;
  std::string city;
  std::string country;
  friend std::istream& operator>>(std::istream& input, Record& r);
};

std::istream& operator>>(std::istream& input, Record& r)
{
  char comma;
  input >> r.year;
  input >> comma;
  input >> r.rank;
  input >> comma;
  std::getline(input, r.name, ',');
  input >> r.score;
  std::getline(input, r.city, ',');
  std::getline(input, r.country, '\n');
  return input;
}

A use case for the above could look like:

std::vector<Record> database;
ifstream data_file ( "myFile.csv" );
Record r;
while (data_file >> r)
{
  database.push_back(r);
}

Note: I have changed your char[] to std::string for easier handling. Also, I changed the while loop condition.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Does this handle intra-quote commas correctly? – Casey Feb 04 '23 at 21:06
  • You need to be explicit about (potential) whitespace before any `comma`s: `input >> std::ws >> comma;` at minimum. (I would also check that `comma` actually _is_ a `,`. For myself, I personally prefer to [`expect`](https://stackoverflow.com/a/75025150/2706707) literals. – Dúthomhas Feb 05 '23 at 05:23
1

//Open the file

Data *universitie = new Data [5];

int i = 0;
std::string line;
while(std::getline(file, line))
{
  std::stringstream ss;
  ss << line;
    
  ss >> universitie[i].year;
  ss.ignore();
  ss >> universitie[i].rank;
  ss.ignore();
  ss >> universitie[i].name;
  ss.ignore();
  ss >> universitie[i].score;
  ss.ignore();
  ss >> universitie[i].city;
  ss.ignore();
  ss >> universitie[i].country;
  
  i++;
}

This is a solution with std::stringstream. The ignore() function is used to skip the ',' between the entries. Also in my opinion is better to use the C++ std::string class instead of char[30]. If at some point you need c string then you can use the c_str() function.