std::thread - read from file line by line

Question

I would like to read in parallel line by line from output file. Every thread read one line then work with data. In the mean time next thread have to read the next line.

std::ifstream infile("test.txt");
std::mutex mtx;

void read(int id_thread){
   while(infile.good()){
     mtx.lock();
     std::string sLine;
     getline(infile, sLine);
     std::cout << "Read by thread: " << id_thread;
     std::cout << sLine << std::endl;
     mtx.unlock();
   }
}

void main(){
  std::vector<std::thread> threads;
  for(int i = 0; i < num; i++){
     threads.push_back(std::thread(parallelFun, i));
  }

  for(auto& thread : threads){
      thread.join();
  }
  return 0;
}

When i run this code i get this: First thread read all lines. How can i make it happen that every thread read one line?

enter image description here

EDIT

As mentioned in the comments all i needed to do was bigger test file. Thanks guys!

What's the problem? Anyway reads are serialized by the mutex. — Anton Savin, Nov 06 '14 at 13:03
That's how threads works, they run for a while, then let another thread run for a while, and so on. If a single thread runs for long enough to read all lines in a short file, then you have to figure out some other way to just read a single line at a time (like signaling a condition variable?). — Some programmer dude, Nov 06 '14 at 13:06
If you want reading to happen in parallel to the processing, you should unlock the mutex as soon as you've read the line, and not until after you've processed the data you've read. — Angew is no longer proud of SO, Nov 06 '14 at 13:07
There's a possible problem where a thread is scheduled out right after checking that the file is still good, only to return when the file is done reading. That's not a deadlock situation, but it may yield an IO error on large files and many threads. The fix is to move the `infile.good()` in the critical section, and update a `bool` with its result which is used for the while loop. — didierc, Nov 06 '14 at 13:11
What you're asking for is basically thread abuse. Saying: "[I] want: first thread to read first line, second thread second line, ... , n-th thread n-th line" is pretty much saying: "I want serial execution." If you want serial execution, don't use multiple threads. — Jerry Coffin, Nov 06 '14 at 14:48
@Surt no, not really. If there was a huge amount and/or blocking operations directed by each line, (maybe they're uri's), only one thread should be reading lines and pushing them onto a queue for other threads/pools to do the work. This pratting around with locks is just silly. — Martin James, Nov 06 '14 at 15:08
@MartinJames I will try both and see which one is faster. Thank you! — dropky, Nov 06 '14 at 15:14
@MartinJames, they wont be totally serialized if there is some more (none-blocking) work for each, but it is not very efficient. — Surt, Nov 06 '14 at 15:23

score 6 · Accepted Answer · answered Nov 06 '14 at 13:07

6

I would change the loop into

while(infile.good()){
     mtx.lock();
     std::string sLine;
     getline(infile, sLine);
     mtx.unlock();
     std::cout << "Read by thread: " << id_thread;
     std::cout << sLine << std::endl;
   }

Your std::cout stuff is the busy part of your test loop which you want to exchange for real code later. This gives the other thread time to kick in. Furthermore make your test file large. It is not uncommon that thread initialization takes some time in which the first thread eats all the data.

answered Nov 06 '14 at 13:07

Oncaphillis

1,888
13
15

1

It looks to me like this is taking poor code and making it worse. The original has a broken input loop, which you've left equally broken. By moving the output outside the mutex section, you've allowed multiple threads to write concurrently, so the output might no longer be coherent. – Jerry Coffin Nov 06 '14 at 14:44
3

This is also not exception safe please use [`std::lock_guard`](http://en.cppreference.com/w/cpp/thread/lock_guard) or one of the other RAII mutex handlers to ensure that the lock always gets released before the next loop iteration or function exit – Mgetz Nov 06 '14 at 14:46
1

@JerryCoffin My assumption was that this was just a minimalistic sample code and that the concurrent threads are supposed to do some really heavy stuff in the real code not necessarily related to synchronized I/O. I agree that this should be mentioned if the question would have been "How do I synchronize I/O between threads" – Oncaphillis Nov 06 '14 at 16:30

score 2 · Answer 2 · answered Nov 06 '14 at 15:22

If you want your 5 threads to read exactly every 5th line you must synchronize the reads, so each thread must know that the previous has finished reading its part. This requirement potentially impose a huge inefficiency as some threads could be waiting a long time for the previous in order to run.

Concept code, untested use at own risk.

Lets first make a default class to handle atomic locks. We align it to avoid false sharing and the associated cache ping-pong.

constexpr size_t CACHELINESIZE = 64; // could differ on your architecture
template<class dType>
class alignas(CACHELINESIZE) lockstep {
  std::atomic<dType> lock = dType(0);

public:
  // spinlock spins until the previous value is prev and then tries to set lock to value
  // until success, restart the spin if prev changes.
  dType Spinlock(dType prev = dType(0), dType next = dType(1)) {
     dType expected = prev;
     while (!lock.compare_exchange_weak(expected, next)) { // request for locked-exclusiv ~100 cycles?
       expected = prev;  // we wish to continue to wait for expected
       do {
         pause(); // on intel waits roughly one L2 latency time.
       } while(lock.load(std::memory_order_relaxed) != prev);  // only one cache miss per change
     }
     return expected;
  }

  void store(dType value) {
    lock.store(value);
  }
};

lockstep<int> lock { 0 };

constexpr int NoThreads = 5;

std::ifstream infile("test.txt");

void read(int id_thread) {
   locks[id_thread].lock = id_thread;
   bool izNoGood = false;
   int next = id_thread;

   while(!izNoGood){
     // get lock for next iteration
     lock.spinlock(next, next); // wait on our number

     // moved file check into locked region     
     izNoGood = !infile.good();
     if (izNoGood) {
       lock.store(next+1); // release next thread to end run.
       return;
     }

     std::string sLine;
     getline(infile, sLine);

     // release next thread
     lock.store(next+1);

     // do work asynchronous
     // ...

     // debug log, hopefully the whole line gets written in one go (atomic)
     // but can be in "random" order relative to other lines.
     std::cout << "Read by thread: " << id_thread << " line no. " << next
               << " text:" << sLine << std::endl;  // endl flushes cout, implicit sync?
     next += NoThreads;  // our next expected line to process
   }
}

void main() {
  std::vector<std::thread> threads;
  for(int i = 0; i < NoThreads; i++) {
     threads.push_back(std::thread(parallelFun, i));
  }

  for(auto& thread : threads){
      thread.join();
  }
  return 0;
}

ravi · Answer 3 · 2014-11-06T13:10:09.640

1

Just in case you want each thread to read a single line ( which is obvious from your description ) , remove the while loop and then you need to make sure that you have same number of threads as number of lines in file.

To get rid of above constraint you can use boost threadpool.

edited Nov 06 '14 at 13:10

answered Nov 06 '14 at 13:04

ravi

10,994
1
18
36

std::thread - read from file line by line

3 Answers3