0

Im fairly new to c++ and i would like to ask for suggestions / advice if there is a better / more optimal way to use a function calling ifstream and string stream.

I have a document with the structure with 150 lines and 8 columns (a small subset with values simplified):

5.43e-08    0.0013  0.0105  0.013   0.026   0.068   0.216   0.663
6.98e-08    0.0004  0.0188  0.022   0.103   0.854   0   0
7.31e-08    0.0004  0.0125  0.017   0.074   0.895   0   0
5.82e-08    0.0006  0.0596  0.075   0.150   0.713   0   0

the number of each line represents a position (pos 1 ... pos 150) and each column is a probability of a quality (Qual1 .. Qual8). My goal is to sample from each line each representing a quality distribution, to create a string of a qualities for all the 150 positions. I have created a function which can do this.

std::string Qual(std::ifstream &infile){
  
  std::string line;
  double Q_1,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7,Q_8;
  char Qualities[] = {'1', '2', '3', '4' ,'5', '6', '7','8','\0'};
  std::string Read_qual;

  while (std::getline(infile, line)){
    std::stringstream ss(line);
    ss >> Q_1 >> Q_2 >> Q_3 >> Q_4 >> Q_5 >> Q_6 >> Q_7 >> Q_8;
    
    std::srand(std::time(nullptr));
    std::random_device rd;
    std::default_random_engine gen(rd());
    std::discrete_distribution<> d({Q_1,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7,Q_8});

    Read_qual += Qualities[d(gen)];
  }
  return Read_qual;
}

The problem is that I have to use this function repeatedly to create multiple of these distributions based on some other input. And as far as I can read here on stack overflow I have to use .clear() and seekq to keep the file open but still use it.

int main(int argc,char **argv){
  std::ifstream infile("Freq.txt");
  std::cout << Qual(infile) << std::endl;
  infile.clear();
  infile.seekg(0);
  std::cout << "-------" << std::endl;
  std::cout << Qual(infile);
  return 0;
}

My question is: Is there a more ideal solution to accomplish this when using c++. Like any functions which are perhaps faster. Could anyone come with any suggestions? is it better to keep opening and closing the file?

pjs
  • 18,696
  • 4
  • 27
  • 56
RAHenriksen
  • 143
  • 2
  • 12
  • Nothing wrong with what you have, IMO, keep it simple. – Paul Sanders Nov 05 '20 at 17:19
  • Read the data into a collection first, then use that data multiple times. – molbdnilo Nov 05 '20 at 17:20
  • Thanks guys!, so with a container such as vector (similar to below), would you suggest to pass the ss stream and then random sampling from it.? or just immediately put the random sampling into the container. – RAHenriksen Nov 05 '20 at 17:52
  • Why you have to use infile.clear() and infile.seekg(0)? I suppose you will repeat this process for 150 times, and each time pick up the 8 doubles of next line for distribution testing. If you issue a infile.seekg(0) before each call of Qual, you would be reading always the same 8 number in the first line. – ytlu Nov 05 '20 at 18:56
  • @ytlu, thanks for your question.! To be clear, the file itself is 150 lines, each with 8 columns. So I create for each line a random distribution of those 8 columns, where i then select a single character with "Read_qual += Qualities[d(gen)];" giving me 150 quality values (one value per line). I have to repeat this process let say 2000 times, but in order for me to repeat that process i have to use infile.clear() between the function call. Which i felt was not ideal, hence the question :-) – RAHenriksen Nov 05 '20 at 19:39
  • @RAHenriksen Understand. I suggest that you may use a double array double all_a[150][8] to store all 1200 doubles in the main(). And pass each row of 8 doubles to do Qual(double *a) for random distribution test for 150 repeats. You may then use this array again and again for another loop of 2000 runs. It save much time than re- reading from the file. 150 x 8 = 1,200, the array has size about 9K, not big. – ytlu Nov 06 '20 at 05:33
  • @ytlu Thanks that clarified things.! thanks for your help and comments – RAHenriksen Nov 06 '20 at 07:11

2 Answers2

1

Lets try caching

Totally untested incomplete code

struct row { // your type that goes into the distribution
  double Q_1,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7,Q_8;
};
using QualData = std::vector<row>;  // typedef

QualData ReadData(std::ifstream &infile) {
  std::string line;
  double Q_1,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7,Q_8;
  char Qualities[] = {'1', '2', '3', '4' ,'5', '6', '7','8','\0'};
  std::string Read_qual;
  QualData qual;

  while (std::getline(infile, line)){
    std::stringstream ss(line);
    ss >> Q_1 >> Q_2 >> Q_3 >> Q_4 >> Q_5 >> Q_6 >> Q_7 >> Q_8;
    
    qual.emplace_back(Q_1,Q_2,Q_3,Q_4,Q_5,Q_6,Q_7,Q_8);
 
  }
  return qual;
}

... do qual

int main(int argc,char **argv){
  std::ifstream infile("Freq.txt");
  auto qualData = ReadData(infile);

  std::cout << Qual(qualData) << std::endl;
  std::cout << "-------" << std::endl;
  std::cout << Qual(qualData);
  return 0;
}

You can imaging what else need to change.

Surt
  • 15,501
  • 3
  • 23
  • 39
  • Thanks for the suggestion. since im new to c++, i have to a ask. So you're creating a vector using the structs to contain the rows. And then with "QualData ReadData(std::ifstream &infile) " youre calling the vector and then inserts the row in the end of the vector using emplace_back. So ReadData is a function with the vector type? because i've never seen it written before with using QualData = std::vector; – RAHenriksen Nov 05 '20 at 17:46
  • @RAHenriksen it is the new form of typedef, much easier to read. – Surt Nov 05 '20 at 18:15
  • I suggest read each line the 8 doubles in the main() into an array[8] or vector(8) , and pass the array to funciton Qual. – ytlu Nov 05 '20 at 19:00
1

My suggestion:

std::string Qual(double *a)
{  
  std::string line;
  char Qualities[] = {'1', '2', '3', '4' ,'5', '6', '7','8','\0'};
  std::string Read_qual;
 
  std::srand(std::time(nullptr));
  std::random_device rd;
  std::default_random_engine gen(rd());
  std::discrete_distribution<> d({a[0],a[1],a[2],a[3],a[4],a[5],a[6],a[7]);
  Read_qual += Qualities[d(gen)];
  return Read_qual;
}

and the main()

 int main()
 {
  std::ifstream infile("Freq.txt");
  double alldata[150][8];
  for (int i=0, i<150; i++)
  for (int j=0; j<8; j++) infile >> alldata[i][j];
  infile.close();

  for (int idx = 0; idx < 2000; idx++)
  {
     for (int row = 0; row < 150; row++) 
     std::cout << Qual(alldata[row]) << std::endl;
   }
  return 0;
}
ytlu
  • 412
  • 4
  • 9