3

I have this little piece of code that simulates a flow that uses large objects (that huge byte[]). For each item in the sequence, an async method is invoked to get some result. The problem? As it is, it throws OutOfMemoryException.

Code compatible with LINQPad (C# Program):

void Main()
{
    var selectMany = Enumerable.Range(1, 100)
                   .Select(i => new LargeObject(i))
                   .ToObservable()
                   .SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));

    selectMany
        .Subscribe(r => Console.WriteLine(r));
}


private static async Task<int> DoSomethingAsync(LargeObject lo)
{
    await Task.Delay(10000);
    return lo.Id;
}

internal class LargeObject
{
    public int Id { get; }

    public LargeObject(int id)
    {
        this.Id = id;
    }

    public byte[] Data { get; } = new byte[10000000];
}

It seems that it creates all the objects at the same time. How can I do it the right way?

The underlying idea is to invoke DoSomethingAsync in order to get some result for each object, so that's why I use SelectMany. To simplify, I just have introduced a Task.Delay, but in real life it is a service that can process some items concurrently, so I want to introduce some concurrency mechanism to get advantage of it.

Please, notice that, theoretically, processing a little number of items at time shouldn't fill the memory. In fact, we only need each "large object" to get the results of the DoSomethingAsync method. After that point, the large object isn't used anymore.

SuperJMN
  • 13,110
  • 16
  • 86
  • 185
  • I cant tell if you problem is with your test code (which the `Enumerable.Range` creating all the large objects eagerly), or if you are seeing this in Production? Either way, if some sequence creates many LargeObjects and they are still being used, so can't be GC'ed then yes you will get an OOM exception. – Lee Campbell Jul 27 '16 at 08:11

4 Answers4

4

I feel like i'm repeating myself. Similar to your last question and my last answer, what you need to do is limit the number of bigObjects™ to be created concurrent.

To do so, you need to combine object creation and processing and put it on the same thread pool. Now the problem is, we use async methods to allow threads to do other things while our async method run. Since your slow network call is async, your (fast) object creation code will keep creating large objects too fast.

Instead, we can use Rx to keep count of the number of concurrent Observables running by combine the object creation with the async call and use .Merge(maxConcurrent) to limit concurrency.

As a bonus, we can also set a minimal time for queries to execute. Just Zip with something that takes a minimal delay.

static void Main()
{
    var selectMany = Enumerable.Range(1, 100)
                        .ToObservable()
                        .Select(i => Observable.Defer(() => Observable.Return(new LargeObject(i)))
                            .SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)))
                            .Zip(Observable.Timer(TimeSpan.FromMilliseconds(400)), (el, _) => el)
                        ).Merge(4);

    selectMany
        .Subscribe(r => Console.WriteLine(r));

    Console.ReadLine();
}


private static async Task<int> DoSomethingAsync(LargeObject lo)
{
    await Task.Delay(10000);
    return lo.Id;
}

internal class LargeObject
{
    public int Id { get; }

    public LargeObject(int id)
    {
        this.Id = id;
        Console.WriteLine(id + "!");
    }

    public byte[] Data { get; } = new byte[10000000];
}
Community
  • 1
  • 1
Dorus
  • 7,276
  • 1
  • 30
  • 36
2

It seems that it creates all the objects at the same time.

Yes, because you are creating them all at once.

If I simplify your code I can show you why:

void Main()
{
    var selectMany =
        Enumerable
            .Range(1, 5)
            .Do(x => Console.WriteLine($"{x}!"))
            .ToObservable()
            .SelectMany(i => Observable.FromAsync(() => DoSomethingAsync(i)));

    selectMany
        .Subscribe(r => Console.WriteLine(r));
}

private static async Task<int> DoSomethingAsync(int i)
{
    await Task.Delay(1);
    return i;
}

Running this produces:

1!
2!
3!
4!
5!
4
3
5
2
1

Because of the Observable.FromAsync you are allowing the source to run to completion before any of the results return. In other words you are quickly building all of the large objects, but slowly processing them.

You should allow Rx to run synchronously, but on the default scheduler so that your main thread is not blocked. The code will then run without any memory issues and your program will remain responsive on the main thread.

Here's the code for this:

var selectMany =
    Observable
        .Range(1, 100, Scheduler.Default)
        .Select(i => new LargeObject(i))
        .Select(o => DoSomethingAsync(o))
        .Select(t => t.Result);

(I've effectively replaced Enumerable.Range(1, 100).ToObservable() with Observable.Range(1, 100) as that will also help with some issues.)

I've tried testing other options, but so far anything that allows DoSomethingAsync to run asynchronously runs into the out of memory error.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • Thanks for your answer, @Enigmativity, but I think I missed something. The async method I'm calling is a remote service that can process items concurrently. It wouldn't be optimal to wait for one item to be processed before processing another. Do you think I can process more than one item concurrently (3 or 4) to get advantage of concurrency without running into memory issues? – SuperJMN Jul 27 '16 at 06:32
  • 1
    @SuperJMN, if you try to create more `LargeObject`s than you can allocate, you will get an OOM exception. This is related to Rx. If they need to be created independently of the `DoSomethingAsync` being executed then you are in trouble. Seems to me that you actually want to persist these to a queue and break out of Rx. – Lee Campbell Jul 27 '16 at 07:02
  • 1
    @SuperJMN - I tried a bunch of things to get it to throttle the processing, but it just didn't work. It's not computationally efficient to process the objects one at a time, but it is memory efficient. It depends what efficiencies you're going for. Shlomo's answer is even worse for computational efficiency. If I think of anything I'll let you know. – Enigmativity Jul 27 '16 at 07:19
  • Thank you! outside Stack Overflow I have been suggested to use TPL/Workflows, but I don't know how to approach this. – SuperJMN Jul 27 '16 at 07:46
  • @LeeCampbell Do you mean loading/unloading large objects inside the DoSomethingAsync method? About the queue, I don't see where to use one. Could you post some example? – SuperJMN Jul 27 '16 at 07:49
  • @SuperJMN - Running these in series on a background thread (which is what my solution does) is probably the way to go. You're bumping in to a memory constraint and that's a hard constraint - you don't want to ever do that - so you should program defensively. – Enigmativity Jul 27 '16 at 10:15
  • I added a TPL version to my answer. Frankly I prefer @Dorus answer though. – Shlomo Jul 27 '16 at 14:04
  • Apologies, my comment above should be "unrelated to Rx" – Lee Campbell Jul 27 '16 at 20:04
1

ConcatMap supports this out of the box. I know this operator is not available in .net, but you can make the same using Concat operator which defers subscribing to each inner source until the previous one completes.

Rohit Sharma
  • 6,136
  • 3
  • 28
  • 47
0

You can introduce a time interval delay this way:

var source = Enumerable.Range(1, 100)
   .ToObservable()
   .Zip(Observable.Interval(TimeSpan.FromSeconds(1)), (i, ts) => i)
   .Select(i => new LargeObject(i))
   .SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));

So instead of pulling all 100 integers at once, immediately converting them to the LargeObject then calling DoSomethingAsync on all 100, it drips the integers out one-by-one spaced out one second each.


This is what a TPL+Rx solution would look like. Needless to say it is less elegant than Rx alone, or TPL alone. However, I don't think this problem is well suited for Rx:

void Main()
{
    var source = Observable.Range(1, 100);

    const int MaxParallelism = 5;
    var transformBlock = new TransformBlock<int, int>(async i => await DoSomethingAsync(new LargeObject(i)),
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = MaxParallelism });
    source.Subscribe(transformBlock.AsObserver());
    var selectMany = transformBlock.AsObservable();

    selectMany
        .Subscribe(r => Console.WriteLine(r));
}
Shlomo
  • 14,102
  • 3
  • 28
  • 43
  • I can appreciate that this might work in practice, but the choice of a one second delay is arbitrary and can still allow out of memory errors or it can significantly slow down the computation. It's not a robust solution. – Enigmativity Jul 27 '16 at 01:07
  • Edited to add TPL answer. Rx doesn't shine here. – Shlomo Jul 27 '16 at 14:05