TPL Dataflow data recycling

Question

I am working on an audio processing tool that I would like to build using TPL Dataflow. The Data flow itself will consist of audio samples being passed between sound processing blocks. Those samples will typically be a few Kb in size (float[]/byte[] with 1k to 4k elements). The data flow is thus a simple pipeline which looks like this:

SourceBlock<byte[]> 
    -> TransformBlock<byte[], float[]> 
        -> TransformBlock<float[], float[]>
            -> ...

Some of the blocks can work purely "in place", ie by mutating the input data, while other have to create new samples. The processing time of each block can vary depending on the data in input.

I don't want to allocate new arrays all the time and rely on the garbage collector to take care of object recycling. I want to benefit from concurrent execution of the blocks and thus don't want to restrict the chain to process data sequentially (in which case I wouldn't need TPL anyway). I don't need processing blocks to run concurrent processing (I am fine with at most one process per block at any given time).

What would be the best scheme to control the number of samples in the pipeline at a given time and recycle samples/arrays no longer used ?

"I don't want to allocate new arrays all the time and rely on the garbage collector to take care of object recycling." Why is this a requirement of your solution? Is garbage collection measurably impacting your performance and if so, what sort of performance are you aiming for? — Jeroen Mostert, Dec 08 '14 at 12:23
I want this code to be able to run on phones using xamarin so the memory footprint matters. it is difficult to test all scenarios (especially hardware) so have to preemptively identify and address likely issues — Lau Lu, Dec 08 '14 at 14:52
My initial thought was for the processors to produce two data, one processing result and one "garbage" to be recycled. the latter would then be piped to processors upstream (effectively creating a loop in the mesh). Not sure this is a viable option. — Lau Lu, Dec 08 '14 at 15:00

score 1 · Accepted Answer · edited Dec 08 '14 at 16:03

1

If your goal is to reuse your arrays instead of always creating new ones and having the GC collect them you need to use an ObjectPool:

The object pool pattern is a software creational design pattern that uses a set of initialized objects kept ready to use – a "pool" – rather than allocating and destroying them on demand. A client of the pool will request an object from the pool and perform operations on the returned object. When the client has finished, it returns the object to the pool rather than destroying it; this can be done manually or automatically.

Unfortunately you would probably need to implement that yourself and make it thread-safe.

edited Dec 08 '14 at 16:03

svick

236,525
50
385
514

answered Dec 08 '14 at 13:48

i3arnon

113,022
33
324
344

This sounds like a good solution. I think I should try to optimize recycling based on the requesting thread (like what concurrentBag does). What do you think ? – Lau Lu Dec 08 '14 at 14:57
@LauLu You're entering the territory of micro-optimization... do you even know whether a TPL Dataflow block uses a dedicated thread for each item? I think a BlockingCollection around a ConcurrentQueue is just fine. – i3arnon Dec 08 '14 at 15:09
I tried the ObjectPool and - once I got it right- it's doing the job, thanks. Petformance-wise there is little to no impact, but I'm using much less memory now. – Lau Lu Dec 10 '14 at 00:43
Hah! TPL Dataflow gives us the thread-safe object pool for free. A `BufferBlock` would be perfect for the job. Post a few buffers into it, with `blk.Post(new float[4096])`, then you can `blk.Receive()`, use it and post it back when you're finished with it. Other methods on `BufferBlock` can lead to an even fancier object pool where you can wait asynchronously for an item to become available. What a shame you didn't spot its usefulness for this :( – spender Oct 21 '15 at 03:23
@spender You only need a `BufferBlock` for blocking asynchronously (which you usually don't do in an object pool, you create new items). `ConcurrentQueue`/`Stack` are great for that, and that was my suggestion. In any case... using `BufferBlock` or another collection is exactly what I meant by building an object pool (no one expects you to write your own primitive collection for that). What you can't (or shouldn't) do is reuse your existing TPL Dataflow pipeline as your pool by controlling the number of items in the pipeline as the OP asked. – i3arnon Oct 21 '15 at 06:07
Indeed, I was suggesting that the BufferBlock was used in isolation, not as part of the Dataflow graph. The reason I think that BufferBlock is so handy is that in the case that the BufferBlock is empty, you can wait asynchronously for something to appear in it. This can be very useful for constraining execution when resources run low. – spender Oct 21 '15 at 11:23

TPL Dataflow data recycling

1 Answers1