I'm making a script that iterates through all chromosomes of a fasta file and splitting it into pieces of 10 bp, the function is called chrdata and i am saving these fragments into a single file. This fragmentation can occur on each chromosome individually completely separate for the other chromosomes, as such i'm trying threads.
chrdata(faidx_t *seq_ref ,int chr_no,FILE *fp)
My goal is wish to make this process faster. To achieve this i have tried multi-threading with the std::thread function.
I have tried different things.
- First i tried to create a thread for the first chromosome and then thread.join() then the next thread for next chromosome and so on.
- Then i tried to create multiple threads at once, like explained in Simultaneous Threads in C++ using <thread> This is the example below.
However as far as I understand and that I can read, I always need to use join otherwise I'll end up with "terminate called without an active exception". The issue is there is no time execution difference between example (1) and (2).
Based on my understanding its becuase despite of creating a vector with thread object they still have to join and thus wait for all the threads to execute. This means this would be concurrent execution and not parallele.
So my question is: Would anyone be able to give me suggestions to the function below where i might change to make the execution faster by using parallele execution?
Or is my understanding of join and concurrent wrong in this instance? I'm not completely sure why we cannot just skip the whole join part, if all the threads are done, why cant we just use detach()?
void function(const char* fastafile,FILE *fp,int thread_no) {
std::vector<std::thread> threads;
//extracting the chromosome file
faidx_t *seq_ref = NULL;
seq_ref = fai_load(fastafile);
assert(seq_ref!=NULL);
int chr_total = 10; //just the first 10 chromosomes
int chr_idx = 0;
int chr_no = 0;
while(chr_idx < chr_total){
for (chr_no; chr_no < std::min(chr_idx+thread_no,chr_total);chr_no++){
threads.push_back(std::thread(chrdata,seq_ref,chr_no,fp));
}
for (auto &th : threads) { th.join(); }
threads.clear();
chr_idx = chr_idx + thread_no;
}
}
I havent attacked main() or chrdata() to make the code and question more clear.
pastebin.com/iY6u9CbH