5

I have a list of URLs of pages I want to download concurrently using HttpClient. The list of URLs can be large (100 or more!)

I have currently have this code:

var urls = new List<string>
            {
                @"http:\\www.amazon.com",
                @"http:\\www.bing.com",
                @"http:\\www.facebook.com",
                @"http:\\www.twitter.com",
                @"http:\\www.google.com"
            };

var client = new HttpClient();

var contents = urls
    .ToObservable()
    .SelectMany(uri => client.GetStringAsync(new Uri(uri, UriKind.Absolute)));

contents.Subscribe(Console.WriteLine);

The problem: due to the usage of SelectMany, a big bunch of Tasks are created almost at the same time. It seems that if the list of URLs is big enough, a lot Tasks give timeouts (I'm getting "A Task was cancelled" exceptions).

So, I thought there should be a way, maybe using some kind of Scheduler, to limit the number of concurrent Tasks, not allowing more than 5 or 6 at a given time.

This way I could get concurrent downloads without launching too many tasks that may get stall, like they do right now.

How to do that so I don't saturate with lots of timed-out Tasks?

halfer
  • 19,824
  • 17
  • 99
  • 186
SuperJMN
  • 13,110
  • 16
  • 86
  • 185
  • 1
    You might want to consider using the [DataFlow](https://msdn.microsoft.com/en-us/library/hh228603%28v=vs.110%29.aspx) API. – Yacoub Massad May 20 '16 at 11:33
  • Could you integrate it using my code? I ignore how to do it using DataFlow. TBH, I have never used it, but looking at some sample would help a lot. – SuperJMN May 20 '16 at 12:03

3 Answers3

15

Remember SelectMany() is actually Select().Merge(). While SelectMany does not have a maxConcurrent paramter, Merge() does. So you can use that.

From your example, you can do this:

var urls = new List<string>
    {
        @"http:\\www.amazon.com",
        @"http:\\www.bing.com",
        @"http:\\www.facebook.com",
        @"http:\\www.twitter.com",
        @"http:\\www.google.com"
    };

var client = new HttpClient();

var contents = urls
    .ToObservable()
    .Select(uri => Observable.FromAsync(() => client.GetStringAsync(uri)))
    .Merge(2); // 2 maximum concurrent requests!

contents.Subscribe(Console.WriteLine);
Dorus
  • 7,276
  • 1
  • 30
  • 36
3

Here is an example of how you can do it with the DataFlow API:

private static Task DoIt()
{
    var urls = new List<string>
    {
        @"http:\\www.amazon.com",
        @"http:\\www.bing.com",
        @"http:\\www.facebook.com",
        @"http:\\www.twitter.com",
        @"http:\\www.google.com"
    };

    var client = new HttpClient();

    //Create a block that takes a URL as input
    //and produces the download result as output
    TransformBlock<string,string> downloadBlock =
        new TransformBlock<string, string>(
            uri => client.GetStringAsync(new Uri(uri, UriKind.Absolute)),
            new ExecutionDataflowBlockOptions
            {
                //At most 2 download operation execute at the same time
                MaxDegreeOfParallelism = 2
            }); 

    //Create a block that prints out the result
    ActionBlock<string> doneBlock =
        new ActionBlock<string>(x => Console.WriteLine(x));

    //Link the output of the first block to the input of the second one
    downloadBlock.LinkTo(
        doneBlock,
        new DataflowLinkOptions { PropagateCompletion = true});

    //input the urls into the first block
    foreach (var url in urls)
    {
        downloadBlock.Post(url);
    }

    downloadBlock.Complete(); //Mark completion of input

    //Allows consumer to wait for the whole operation to complete
    return doneBlock.Completion;
}

static void Main(string[] args)
{
    DoIt().Wait();
    Console.WriteLine("Done");
    Console.ReadLine();
}
SuperJMN
  • 13,110
  • 16
  • 86
  • 185
Yacoub Massad
  • 27,509
  • 2
  • 36
  • 62
  • Wow. It looks really nice, but I would like to know how to do the equivalent thing using Rx. Thanks in advance! – SuperJMN May 20 '16 at 12:23
1

Can you see if this helps?

var urls = new List<string>
        {
            @"http:\\www.amazon.com",
            @"http:\\www.bing.com",
            @"http:\\www.google.com",
            @"http:\\www.twitter.com",
            @"http:\\www.google.com"
        };

var contents =
    urls
        .ToObservable()
        .SelectMany(uri =>
            Observable
                .Using(
                    () => new System.Net.Http.HttpClient(),
                    client =>
                        client
                            .GetStringAsync(new Uri(uri, UriKind.Absolute))
                            .ToObservable()));
Enigmativity
  • 113,464
  • 11
  • 89
  • 172