0

I have a Winform application. After button2 is clicked, the application will start to download a lot of images with url stored in list of ThousandsOfImageURLs. The application works fine for the first 40 images download. But after that, RunBrowserThread can not be started by Task.Factory.StartNew in function ParallelRunBrowser. I can see every 10 seconds Task.Factory.StartNew is hit but the the breakpoint inside RunBrowserThread of class DownLoadImages can not be hit any more after about 40 images are downloaded. I don't see any exception was thrown in a few hours hanging in this way.

What is the possible issue?

Thanks,

    private void button2_Click(object sender, EventArgs e)
    {
        var sta = new StaTaskScheduler(numberOfThreads: 4);
        var doImages = new ThreadLocal<DownLoadImages>(() => new DownLoadImages());

        foreach (string url in ThousandsOfImageURLs)
              ParallelRunBrowser(strSiteRoot + url, sta, doImages);
            }
        }
    }

    private void ParallelRunBrowser(string url, StaTaskScheduler sta, ThreadLocal<DownLoadImages> doImages)
    {
        Thread.Sleep(10000);

            Task.Factory.StartNew(() =>
            {
                doImages.Value.RunBrowserThread(new Uri(url));
            },
            CancellationToken.None,
            TaskCreationOptions.None,
            sta);
    }


class DownLoadImages
{
    public void RunBrowserThread(Uri url)
    {
            var br = new WebBrowser();
            br.DocumentCompleted += Browser_DocumentCompleted;
            br.Navigate(url);
            Application.Run();
    }

    void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        var br = sender as WebBrowser;
        if (br.Url == e.Url)
        {
                FinishDownLoadHere(sender, e);
                Application.ExitThread();   
                ((WebBrowser)sender).Dispose();
                GC.Collect();
        }
    } 
}
Don
  • 1,532
  • 4
  • 24
  • 47

1 Answers1

2

Using webbrowsers, Application.Run and GC.collect seems a bit much.

Why not use a WebClient to get the images?

From: Download image from the site in .NET/C#

string localFilename = @"c:\localpath\tofile.jpg";
using(WebClient client = new WebClient())
{
    client.DownloadFile("http://www.example.com/image.jpg", localFilename);
}
Community
  • 1
  • 1
Emond
  • 50,210
  • 11
  • 84
  • 115
  • Thanks. I tried to use WebClient. But the images do not have a static link like http://www.example.com/image.jpg. They are just generated on the fly in a MVC web site. The URL is more like http://www.example.com/image. So I found only launching a client browser can I download the image. – Don Jan 03 '14 at 14:10
  • If so you might be best of using a tool such as HTTrack. – Emond Jan 03 '14 at 14:16
  • that is a good idea. Instead of building my own tool, it is good to use others. I will try to see if the tool allows me to download images only. Thanks – Don Jan 03 '14 at 14:30
  • Even if it doesn't, you can easily filter them out after downloading. – Emond Jan 03 '14 at 14:31
  • I tried it. It can download images. But the problem is that since the images are generated on the fly in the web site. The image is named randomly. That is a big problem for the tool. I need the name to be in a specific way corresponding to the url. Looks like I still need to build my own tool. – Don Jan 03 '14 at 14:57
  • Download the site using httrack and the parse the downloaded pages for img tags to find the names of the images – Emond Jan 03 '14 at 15:13
  • BTW: It might be that the site you are scraping is protected against more than x concurrent connections from one IP address. Be nice. That is another reason to download (not to aggressively) and parse instead of connecting with more than 40 sessions. – Emond Jan 04 '14 at 08:02
  • Thanks. I thought httrack used many concurrent connections too (maybe it uses different IP addresses). Their download and parse is very fast. – Don Jan 04 '14 at 16:42
  • I believe that is a part I was missing. Thanks. – Don Jan 04 '14 at 19:39