I need to crawl parent web pages and its children web pages and I followed the producer/consumer concept from http://www.albahari.com/threading/part4.aspx#%5FWait%5Fand%5FPulse. Also, I used 5 threads which enqueue and dequeue links.
Any recommendations on how will I end/join all the threads once all of them have finished processing the queue, given that the length of queue is unknown?
Below is the idea on how I coded it.
static void Main(string[] args)
{
//enqueue parent links here
...
//then start crawling via threading
...
}
public void Crawl()
{
//dequeue
//get child links
//enqueue child links
}