7

I'm writing a downloader in C# and stopped at the following problem: what kind of method should I use to parallelize my downloads and update my GUI?

In my first attempt, I used 4 Threads and at the completion of each of them I started another one: main problem was that my cpu goes 100% at each new thread start.

Googling around, I found the existence of BackgroundWorker and ThreadPool: stating that I want to update my GUI with the progress of each link that I'm downloading, what is the best solution?

1) Creating 4 different BackgroundWorker, attaching to each ProgressChanged event a Delegate to a function in my GUI to update the progress?

2) Use ThreadPool and setting max and min number of threads to the same value?

If I choose #2, when there are no more threads in the queue, does it stop the 4 working threads? Does it suspend them? Since I have to download different lists of links (20 links each of them) and move from one to another when one is completed, does the ThreadPool start and stop threads between each list?

If I want to change the number of working threads on live and decide to use ThreadPool, changing from 10 threads to 6, does it throw and exception and stop 4 random threads?

This is the only part that is giving me an headache. I thank each of you in advance for your answers.

CodingWithSpike
  • 42,906
  • 18
  • 101
  • 138
DDB
  • 157
  • 2
  • 8
  • Why don't you use threads from the Threadpool? http://msdn.microsoft.com/en-us/library/3dasc8as%28v=vs.80%29.aspx#Y23 – Stormenet Aug 02 '11 at 14:16

4 Answers4

11

I would suggest using WebClient.DownloadFileAsync for this. You can have multiple downloads going, each raising the DownloadProgressChanged event as it goes along, and DownloadFileCompleted when done.

You can control the concurrency by using a queue with a semaphore or, if you're using .NET 4.0, a BlockingCollection. For example:

// Information used in callbacks.
class DownloadArgs
{
    public readonly string Url;
    public readonly string Filename;
    public readonly WebClient Client;
    public DownloadArgs(string u, string f, WebClient c)
    {
        Url = u;
        Filename = f;
        Client = c;
    }
}

const int MaxClients = 4;

// create a queue that allows the max items
BlockingCollection<WebClient> ClientQueue = new BlockingCollection<WebClient>(MaxClients);

// queue of urls to be downloaded (unbounded)
Queue<string> UrlQueue = new Queue<string>();

// create four WebClient instances and put them into the queue
for (int i = 0; i < MaxClients; ++i)
{
    var cli = new WebClient();
    cli.DownloadProgressChanged += DownloadProgressChanged;
    cli.DownloadFileCompleted += DownloadFileCompleted;
    ClientQueue.Add(cli);
}

// Fill the UrlQueue here

// Now go until the UrlQueue is empty
while (UrlQueue.Count > 0)
{
    WebClient cli = ClientQueue.Take(); // blocks if there is no client available
    string url = UrlQueue.Dequeue();
    string fname = CreateOutputFilename(url);  // or however you get the output file name
    cli.DownloadFileAsync(new Uri(url), fname, 
        new DownloadArgs(url, fname, cli));
}


void DownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
    DownloadArgs args = (DownloadArgs)e.UserState;
    // Do status updates for this download
}

void DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
    DownloadArgs args = (DownloadArgs)e.UserState;
    // do whatever UI updates

    // now put this client back into the queue
    ClientQueue.Add(args.Client);
}

There's no need for explicitly managing threads or going to the TPL.

Automatico
  • 12,420
  • 9
  • 82
  • 110
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • I think line ClientQueue.Add(new WebClient()); is wrong and should be ClientQueue.Add(cli). Anyway, I think there are 2 problems with this method: 1) I have to specify the file name before downloading it, but I don't know its name beforehand. I usually take the name from the link either from the "Content-Disposition" of http header response. 2) First time I wrote my app, among the available choices, there was WebClient, but if I remember correctly it opens InternetExplorer in the background for each link! I still remember that pop-up that came from nowhere... – DDB Aug 02 '11 at 16:52
  • I fixed the bug you identified. `WebClient` does not open IE. You might be thinking of the `WebBrowser` control. `WebClient` is a wrapper around `HttpWebRequest` and `HttpWebResponse`. If you need information from the headers, you can get them from the `ResponseHeaders` property. The above is just an example. Your requirements are easily met by making some minor changes. – Jim Mischel Aug 02 '11 at 17:34
  • Whereas using explicit threads could work, it's incredibly wasteful to dedicate one thread for each download. And the TPL might not do a very good job. On a good connection, you can have a dozen or more concurrent downloads running, each of which will be an individual thread that spends most of its time waiting for data. Contrast that to using `DownloadFileAsync`, which will allocate only as many threads as are needed to handle the data as it's downloaded. – Jim Mischel Aug 02 '11 at 17:57
  • If I wanted resume capability, how could I do? And also, `ResponseHeaders` property should be available after I run `DownloadFileAsync`, so I can't specify the name (maybe I can rename at its completion). What I'm more worried about is the method `DownloadDataAsync`: if I read correctly, it saves data in an internal array, but what happens if the file to download is of 1GB? (what I have to download rarely pass 1MB, most of time 200KB) – DDB Aug 03 '11 at 10:08
  • If you want resume capability, then you'll have to create a derived class from `WebClient` and override the `GetWebRequest` method so that you can modify the request. Yes, you'd want to rename the file after completion. As far as `DownloadDataAsync` is concerned, it's going to fail if the downloaded file is too large to fit in memory. – Jim Mischel Aug 03 '11 at 13:41
  • Last question: being an async method, in case `DownloadFileAsync` throws an exception, how and where should I catch it? It throws `WebException` and `InvalidOperationException` but I don't have any idea how to manage them. Thanks. Regards. – DDB Aug 05 '11 at 11:04
  • Exceptions that happen before the actual download starts can be caught by putting a `try/catch` around the call to `DownloadFileAsync`. I don't know what happens to exceptions that occur while in the middle of downloading the file (i.e. a connection dropped, for example). I *think* the error is reported in the `Error` property of the `AsyncCompletedEventArgs` object that is passed to the `DownloadFileCompleted` event handler. – Jim Mischel Aug 05 '11 at 22:47
  • I was also wondering, a bit unrelated but, if I use an `HttpWebRequest` and set some cookies and in the `HttpWebResponse` there is a Set-Cookie header, in `CookieCollection` got from the latter, do I have both list of cookies? Anyway, your suggestion about `WebClient` is the best one, I will choose you, but not now, I don't know if other answers can be added once I accept an answer. – DDB Aug 05 '11 at 22:55
  • @DDB: Other answers can be added after you select one. And if you decide that another answer is better, you can change the selected answer. – Jim Mischel Aug 09 '11 at 16:07
4

I think you should look into using the Task Parallel Library, which is new in .NET 4 and is designed for solving these types of problems

Jason
  • 15,915
  • 3
  • 48
  • 72
  • Might I amend this with another solution along the same route? A backgroundworker with a parallel.foreach(urls, url=> {/*do action*/}); in it. -- its easier to read (like a foreach), and allows the logic to continue while the BGW is running. – Jeremy Boyd Aug 02 '11 at 15:12
  • It seems that MaxDegreeOfParallelism allows me to set the max number of threads/tasks but not the min number. Not only this, but it seems that I can't change this value on live. Good suggestion though. – DDB Aug 02 '11 at 17:05
0

Having 100% cpu load has nothing to do with the download (as your network is practically always the bottleneck). I would say you have to check your logic how you wait for the download to complete.

Can you post some code of the thread's code you start multiple times?

thekip
  • 3,660
  • 2
  • 21
  • 41
  • It has nothing to do with the code per se, but the fact that creating a new thread does use cpu (I lied about the 100% cpu, it is more 40-50% only for the time [instant] to create the thread, the it goes to normal [I'm on an old Turion64bit 1.8GHz, single core, so I notice these abuses of cpu) and creating and destroying threads is a waste of cpu and ram since they can be reused: I'd like to know what the "best" solution would be. – DDB Aug 02 '11 at 17:15
0

By creating 4 different backgroundworkers you will be creating seperate threads that will no longer interfere with your GUI. Backgroundworkers are simple to implement and from what I understand will do exactly what you need them to do.

Personally I would do this and simply allow the others to not start until the previous one is finished. (Or maybe just one, and allow it to execute one method at a time in the correct order.)

FYI - Backgroundworker

sealz
  • 5,348
  • 5
  • 40
  • 70
  • Using `BackgroundWorker` does not create a new process, but rather executes on a thread pool thread. He will not be creating separate processes, but rather separate threads. – Jim Mischel Aug 02 '11 at 14:43