Parallel loading images in LINQ

Question

I am experimenting with parallel and LINQ. Look at the code below. It works, but just to get the idea:

private void LoadImages(string path)
{
    images =
        Directory.GetFiles(path)
        .Select(f => GetImage(f))
        .ToList();
}

private Image GetImage(string path)
{
    return Image.FromFile(path);
}

So I am basically getting an image from each file found in the specified directory. The question is - how to make this parallel? Right now it is like iterating over them. I would like to parallelize it "somehow". Somehow, because I am too inexperienced to suggest an idea how to achieve this, so that's why I am asking you, guys, counting on some help to make this faster :)

http://www.codeproject.com/Articles/652556/Can-you-explain-Lazy-Loading — csharpwinphonexaml, Mar 05 '15 at 17:36
Parallel. I want to execute the same operation on all the items in a collection, in parallel, blocking the current thread until all operations have completed. — ebvtrnog, Mar 05 '15 at 17:44

Panagiotis Kanavos · Accepted Answer · 2015-03-05T17:53:20.427

5

Using PLINQ:

var images=(from file in Directory.EnumerateFiles(path).AsParallel()
           select GetImage(file)).ToList();

Reading the images isn't CPU bound, so you can specify a higher degree of parallelism:

var images=(from file in Directory.EnumerateFiles(path)
                                  .AsParallel()
                                  .WithDegreeOfParallelism(16)
           select GetImage(file)).ToList();

edited Mar 05 '15 at 17:53

answered Mar 05 '15 at 17:44

Panagiotis Kanavos

120,703
13
188
236

I'm a bit confused by this answer. "Reading the images isn't CPU bound, so you can specify a higher degree of parallelism" How would this help? Disk access is synchronized anyway, right? So, using a higher degree of parallelism should not help unless I'm missing something. – Thash Aug 24 '18 at 11:18
1

@Thash even if disk access was synchronous (it isn't) there are *multiple* levels of caching at the disk, controller, OS level which means that the data you need may already be loaded in one of the caches. Disks batch IO commands too, to improve throughput. Finally, IO in Windows is *a*sychronous since the NT days. Synchronous API calls are *emulated* to make programming easier – Panagiotis Kanavos Sep 03 '18 at 08:37
1

@Thash that said, it doesn't mean that a DOP of 16 will result in 16x better performance. The actual performance will depend on the type of files, their size, etc. The aim is to use the disk's IO to its maximum. By reading multiple files in parallel the disk is busy loading one file while the OS handles the administrative overhead of finding and loading another one. That's why disk benchmarks use different tests for small and large files. Reading small files benefits from a high DOP while large files require a *smaller* one – Panagiotis Kanavos Sep 03 '18 at 08:47
thanks for clarifying! I've measured a couple of approaches. So far, I have been loading raw encoded image data synchronously and processing it on background threads using tasks. You are right that this way, it doesn't utilize the disk to its maximum. In my situation, the difference is not that big though (the processing usually takes as long as loading the file if not longer) so I think I'll leave it this way for now. – Thash Sep 03 '18 at 12:31

Eric J. · Answer 2 · 2015-03-05T17:43:07.160

2

You could do something like

var images = new ConcurrentBag<Image>();

Parallel.ForEach(Directory.GetFiles(path)
.Select(f => new { img = GetImage(f) })
.Select(x => x.img), img => images.Add(img));

edited Mar 05 '15 at 17:43

answered Mar 05 '15 at 17:36

Eric J.

147,927
63
340
553

@PanagiotisKanavos: I'm not in front of a compiler right now. Feel free to edit if you find a mistake. The method signature accepts an `IEnumerable` as the first parameter and an `Action` as the second parameter. – Eric J. Mar 05 '15 at 17:42

Parallel loading images in LINQ

2 Answers2

Linked