Combine framework: how to process each element of array asynchronously before proceeding

Question

I'm having a bit of a mental block using the iOS Combine framework.

I'm converting some code from "manual" fetching from a remote API to using Combine. Basically, the API is SQL and REST (in actual fact it's Salesforce, but that's irrelevant to the question). What the code used to do is call a REST query method that takes a completion handler. What I'm doing is replacing this everywhere with a Combine Future. So far, so good.

The problem arises when the following scenario happens (and it happens a lot):

We do a REST query and get back an array of "objects".
But these "objects" are not completely populated. Each one of them needs additional data from some related object. So for each "object", we do another REST query using information from that "object", thus giving us another array of "objects".
This might or might not allow us to finish populating the first "objects" — or else, we might have to do another REST query using information from each of the second "object", and so on.

The result was a lot of code structured like this (this is pseudocode):

func fetchObjects(completion: @escaping ([Object] -> Void) {
    let restQuery = ...
    RESTClient.performQuery(restQuery) { results in
        let partialObjects = results.map { ... }
        let group = DispatchGroup()
        for partialObject in partialObjects {
            let restQuery = ... // something based on partialObject
            group.enter()
            RESTClient.performQuery(restQuery) { results in
                group.leave()
                let partialObjects2 = results.map { ... }
                partialObject.property1 = // something from partialObjects2
                partialObject.property2 = // something from partialObjects2
                // and we could go down yet _another_ level in some cases
            }
        }
        group.notify {
            completion([partialObjects])
        }
    }
}

Every time I say results in in the pseudocode, that's the completion handler of an asynchronous networking call.

Okay, well, I see well enough how to chain asynchronous calls in Combine, for example by using Futures and flatMap (pseudocode again):

let future1 = Future...
future1.map {
    // do something
}.flatMap {
    let future2 = Future...
    return future2.map {
        // do something
    }
}
// ...

In that code, the way we form future2 can depend upon the value we received from the execution of future1, and in the map on future2 we can modify what we received from upstream before it gets passed on down the pipeline. No problem. It's all quite beautiful.

But that doesn't give me what I was doing in the pre-Combine code, namely the loop. Here I was, doing multiple asynchronous calls in a loop, held in place by a DispatchGroup before proceeding. The question is:

What is the Combine pattern for doing that?

Remember the situation. I've got an array of some object. I want to loop through that array, doing an asynchronous call for each object in the loop, fetching new info asynchronously and modifying that object on that basis, before proceeding on down the pipeline. And each loop might involve a further nested loop gathering even more information asynchronously:

Fetch info from online database, it's an array
   |
   V
For each element in the array, fetch _more_ info, _that's_ an array
   |
   V
For each element in _that_ array, fetch _more_ info
   |
   V
Loop thru the accumulated info and populate that element of the original array

The old code for doing this was horrible-looking, full of nested completion handlers and loops held in place by DispatchGroup enter/leave/notify. But it worked. I can't get my Combine code to work the same way. How do I do it? Basically my pipeline output is an array of something, I feel like I need to split up that array into individual elements, do something asynchronously to each element, and put the elements back together into an array. How?

The way I've been solving this works, but doesn't scale, especially when an asynchronous call needs information that arrived several steps back in the pipeline chain. I've been doing something like this (I got this idea from https://stackoverflow.com/a/58708381/341994):

An array of objects arrives from upstream.
I enter a flatMap and map the array to an array of publishers, each headed by a Future that fetches further online stuff related to one object, and followed by a pipeline that produces the modified object.
Now I have an array of pipelines, each producing a single object. I merge that array and produce that publisher (a MergeMany) from the flatMap.
I collect the resulting values back into an array.

But this still seems like a lot of work, and even worse, it doesn't scale when each sub-pipeline itself needs to spawn an array of sub-pipelines. It all becomes incomprehensible, and information that used to arrive easily into a completion block (because of Swift's scoping rules) no longer arrives into a subsequent step in the main pipeline (or arrives only with difficulty because I pass bigger and bigger tuples down the pipeline).

There must be some simple Combine pattern for doing this, but I'm completely missing it. Please tell me what it is.

Just out of curiosity, is the entity array needed? With just flatMap, you will get each entity one at a time as they complete. Entities can be updated as they complete instead of waiting until everything is done. — Jeffery Thomas, May 17 '20 at 18:24
@JefferyThomas Well, I suppose that depends what you mean by "needed". The upstream API returns me an array, and the downstream view controller expects an array. So the endpoints of the pipeline are not exactly up to me, if you see what I mean. — matt, May 17 '20 at 20:38
@JefferyThomas Also I do not know what you mean by "with just `flatMap`". Merely using `flatMap` does not flatten an array. — matt, May 18 '20 at 15:02
Oh yeah, I used MergeMany to combine the array of publishers in the flatMap. That was an important detail. — Jeffery Thomas, May 19 '20 at 12:35
@JefferyThomas So you are referring to what I’m already doing. But that is what I don’t want to be doing. — matt, May 19 '20 at 12:39
The difference is `MergeMany` calls receive value one at a time as the values are available, whereas `collect()` waits until all values are available before calling receive value. — Jeffery Thomas, May 19 '20 at 13:14
@JefferyThomas I am using both MergeMany and collect, as I have explained (and see the link). I receive an array at the beginning of the pipeline, and I have to produce an array at the end of the pipeline. — matt, May 19 '20 at 14:01

New Dev · Accepted Answer · 2020-08-18T16:48:45.443

15

With your latest edit and this comment below:

I literally am asking is there a Combine equivalent of "don't proceed to the next step until this step, involving multiple asynchronous steps, has finished"

I think this pattern can be achieved with .flatMap to an array publisher (Publishers.Sequence), which emits one-by-one and completes, followed by whatever per-element async processing is needed, and finalized with a .collect, which waits for all elements to complete before proceeding

So, in code, assuming we have these functions:

func getFoos() -> AnyPublisher<[Foo], Error>
func getPartials(for: Foo) -> AnyPublisher<[Partial], Error>
func getMoreInfo(for: Partial, of: Foo) -> AnyPublisher<MoreInfo, Error>

We can do the following:

getFoos()
.flatMap { fooArr in 
    fooArr.publisher.setFailureType(to: Error.self)
 }

// per-foo element async processing
.flatMap { foo in

  getPartials(for: foo)
    .flatMap { partialArr in
       partialArr.publisher.setFailureType(to: Error.self)
     }

     // per-partial of foo async processing
    .flatMap { partial in

       getMoreInfo(for: partial, of: foo)
         // build completed partial with more info
         .map { moreInfo in
            var newPartial = partial
            newPartial.moreInfo = moreInfo
            return newPartial
         }
     }
     .collect()
     // build completed foo with all partials
     .map { partialArr in
        var newFoo = foo
        newFoo.partials = partialArr
        return newFoo
     }
}
.collect()

(Deleted the old answer)

edited Aug 18 '20 at 16:48

answered May 23 '20 at 07:06

New Dev

48,427
12
87
129

Thanks. It seems to me that this is no different from what I'm already doing. I too had to resort to tuples to keep all the information flowing together down the pipeline. Your code loses the Error information by starting with a Subject whose Failure type is Never and using `arrOfFoos.publisher` which _requires_ that the Failure type be Never; I'm not willing to do that, which is why I use `.map` on the array instead. – matt May 23 '20 at 13:58
Maybe I'm not following. This pipeline starts with a partial object - it doesn't care how you got it. But you can always handle the error of the original request however you want (with `Catch`, etc..), and only in success case, feed the objects to the `fooSubject`. The big difference here and what you did, is that it creates a single Foo pipeline, and sequentially feeds Foo objects into it, instead of creating parallel pipelines. It's also a linear composition (no nesting) that propagates upstream values with `.Zip` like you needed – New Dev May 23 '20 at 14:08
Which `.map` do you mean? `future1.map` just maps the one result - an array - into another array of objects. You are can modify your partial objects however you need before sending them down their own pipeline, which though starts with Failure type of Never (which makes sense, since they are real objects), does not need to end without a Failure type. Indeed it allows you handle each individual object's error however you need, or not handle it, as they case may be, and return `Result or something // @matt – New Dev May 23 '20 at 14:23
Let's not quibble over the details. The point (I think!) is that you're telling me (I think!): "You're right, there's no way to stop and loop over the contents of the array before proceeding, as you could do with `for...in` and DispatchGroup. You'll just have to split the array into elements and pass each element down the pipeline, process it, and collect them later. And you're probably going to need to use tuples so that each element and the ancillary information needed to modify it arrive together into the processing routine (what would have been the body of the `for` loop)." – matt May 23 '20 at 14:46
The details of _how_ to do that are probably just a matter of taste. Whether we use a sequence publisher or a map to split up the array, whether we use zip or flatMap is neither here nor there. The point is that you're confirming that the architecture I'm using is what one has to do, one way or another. – matt May 23 '20 at 14:50
Hmm, I'm saying: "Of course you can loop, this doesn't preclude that at all", but at some point you'll want to act on each object in the array, and then the pattern I'm proposing is to send each element down a pipeline sequentially and collect into an array at the very last step. Of course, this only works if each request per element is independent of the other elements in the array (which is how I understood your question). @matt – New Dev May 23 '20 at 15:08
Yes, actually that raises another issue with your code and my code: the copy of the Foo that arrives together with partial1 and partial2 is different for each partial2 (or whatever the finest grained division is). So we don't actually `collect` the original Foo objects, modified; we collect a whole bunch of copies of each original Foo, each one of them modified in a different way, and we have to reassemble them into the original "unified" Foos. This is another reason I miss simple looping. – matt May 23 '20 at 15:21
That's what the loop is _for_, to act on each object in the array. I see that I can disassemble the array into a stream of objects, process each one asynchronously further down the pipeline by means of zip or flatMap, and put them all back together again. It's just ugly, and as you rightly say, you need to pass all the needed info down the pipeline, which I find makes for some pretty nasty-looking tuples or even arrays of tuples. And when the Future _itself_ returns an array to be looped over asynchronously to apply to the original object, it gets really hairy. Maybe I asked badly... – matt May 23 '20 at 20:38
@matt saw your question edit - didn't realize that there was a loop within a loop. It is an "ugly" problem - not sure if there is non-ugly solution. Probably the closest to a non-ugly you can hope to get is something that looks similar to a synchronous code. I think, maybe, if you had an `async/await` or promise pseudo-code. it would clarify exactly what you're looking for and what you consider too ugly – New Dev May 24 '20 at 14:27
Well the question started with the pseudo-code showing what I used to have. It worked fine. We keep moving rightward a level after each database query or group of queries (for an array). We make the call, we emerge from a completion handler, and if there's a loop we don't proceed further until all the loops have completed because there's a DispatchGroup that lets us wait (quite literally). And info gathered earlier remains in scope, so we have _all_ of it when we reach the end of inmost loop and are ready to populate our objects and produce them. – matt May 24 '20 at 15:33
Sure I could "straighten" that out by dividing it into multiple methods, but the architecture remains the same. And it's a lot better than tuples and too many copies of my objects, which is what you're implying (and what I'm now actually doing in Combine). – matt May 24 '20 at 15:35
I literally am asking is there a Combine equivalent of "don't proceed to the next step until this step, involving multiple asynchronous steps, has finished". And your answer, I'm afraid, does not encourage me to think that there is. – matt May 24 '20 at 15:36
I think it could be achieved with a `flatMap` to an array publisher (which emits one-by-one, and sends a completion after the last element), followed by some async processing of each item, followed by a `collect`, which waits (i.e. doesn't emit) until completion signal. I amended the answer... It's still ugly-ish, but at least there are no tuples // @matt – New Dev May 24 '20 at 20:29
1

Right, I think I like this. I believe we're still doing the same thing: flatMap and split out the array with Publishers.Sequence, and then process each item and `collect` them at the end. But by nesting those, we can run it as deep as we need to without the use of the tuple, because we now have the same Swift scoping rules I was using for the DispatchQueue. Let me try converting my code to this structure and see...! – matt May 24 '20 at 20:48
1

Yup, that's the pattern I needed all right. This basic idea — `flatMap` to a Publishers.Sequence, process each item starting with a Future based on it, and end up with `collect` — and then nest those as necessary, is the clean look I was hoping for. Basically that _is_ the same nest I was making with `for...in` and a DispatchGroup, and it does the same thing: at each level, we don't pass `collect` until what precedes has transformed each element of the original array. And that is what my question asked for! – matt May 25 '20 at 02:14
@matt, thanks for bounty. One thing to be conscious of is that the order of elements in the `collect`ed array is not necessarily the same as in the original sequence due to the async step. I wonder if there is a way to ensure the order too. – New Dev May 25 '20 at 02:47
Yes, I found that out but it doesn't really worry me in this case. We can serialize using MergeMany if order is important. The same thing was true, after all, of my original code! – matt May 25 '20 at 02:52

matt · Answer 2 · 2020-05-25T03:17:58.940

Using the accepted answer, I wound up with this structure:

head // [Entity]
    .flatMap { entities -> AnyPublisher<Entity, Error> in
        Publishers.Sequence(sequence: entities).eraseToAnyPublisher()
    }.flatMap { entity -> AnyPublisher<Entity, Error> in
        self.makeFuture(for: entity) // [Derivative]
            .flatMap { derivatives -> AnyPublisher<Derivative, Error> in
                Publishers.Sequence(sequence: derivatives).eraseToAnyPublisher()
            }
            .flatMap { derivative -> AnyPublisher<Derivative2, Error> in
                self.makeFuture(for: derivative).eraseToAnyPublisher() // Derivative2
        }.collect().map { derivative2s -> Entity in
            self.configuredEntity(entity, from: derivative2s)
        }.eraseToAnyPublisher()
    }.collect()

That has exactly the elegant tightness I was looking for! So the idea is:

We receive an array of something, and we need to process each element asynchronously. The old way would have been a DispatchGroup and a for...in loop. The Combine equivalent is:

The equivalent of the for...in line is flatMap and Publishers.Sequence.
The equivalent of the DispatchGroup (dealing with asynchronousness) is a further flatMap (on the individual element) and some publisher. In my case I start with a Future based on the individual element we just received.
The equivalent of the right curly brace at the end is collect(), waiting for all elements to be processed and putting the array back together again.

So to sum up, the pattern is:

flatMap the array to a Sequence.
flatMap the individual element to a publisher that launches the asynchronous operation on that element.
Continue the chain from that publisher as needed.
collect back into an array.

By nesting that pattern, we can take advantage of Swift scoping rules to keep the thing we need to process in scope until we have acquired enough information to produce the processed object.

Hi @matt this seems like a very interesting method that I can help me! Are you able to share an example code via Github gist and some dummy data to test with? In my use case, I’m also fetching an array of data that has 2 nested arrays, each with an image to download with their own pertinent nested arrays inside them. Thank you in advance! — Paul D., Jul 12 '20 at 00:01
Ok I will try it out tomm...It would really help if there was dummy data (ie array structure similar to your use case) to test these nested publishers to really understand what was going on. Thx — Paul D., Jul 12 '20 at 02:59
Hi @matt, I tried to apply your solution to my use case (with some tweaks) and did have success in getting the expected response back from the API server. However, I'm running into an issue where I don't know how I would go about assigning the returned array of Future objects to be associated array entities. I have posted my question in detail here: https://stackoverflow.com/questions/62895539/swift-combine-urlsession-retrieving-dataset-photos-using-2x-publishers-and-zip. Any help would be greatly appreciated! Thx — Paul D., Jul 14 '20 at 12:53
can't we use head.publisher instead of the Publishers.Sequence? I'm relatively new to Combine, so I'm not sure if this is a recent addition. — Aswath, Sep 09 '20 at 15:52
@Aswath The thing to be published as a sequence is `derivatives`, not `head`. The notation is up to you; I find this way clearer. — matt, Sep 09 '20 at 16:09

Combine framework: how to process each element of array asynchronously before proceeding

2 Answers2

Linked