0

I have a sub-orchestration that calls a couple activities. One of the activities is called ~150 times and each activity is put in a List of tasks then await Task.WhenAll(list). Each of these tasks returns a base64 encoded image so the messages are on the larger side.

The orchestration aggregates the results from these activities and returns them to the parent orchestration. When stepping through with the debugger, the orchestration finishes correctly and returns the appropriate results.

I have a breakpoint in the parent orchestration on the next step after receiving the results from the sub-orchestration, but it never gets hit. The results never return to the parent.

Could this have to do with the message size being returned from the sub-orchestration?

If I inline the sub orchestration code within the parent orchestration instead of calling it as a sub orchestration it works fine

Dexterity
  • 41
  • 2
  • 7
  • Can you post some code so that is possible to get a better picture in detail. – Sebastian Achatz Nov 08 '18 at 08:24
  • It sounds like a bug. Does this reproduce in Azure? If so, can you share the parent orchestration ID and the region your app is running in? – Chris Gillum Nov 08 '18 at 15:18
  • I had deleted the instance for this particular question, but I am setting up another to try to replicate exactly this one. On another VM though I have something similar, where the parent is in running and three sub orchestrations are just sitting in a running state, and therefore the whole orchestration has been stuck for 30 min. Parent execution id is 25114a61c21c406a8a27cb4b6f8be8aa, children are: 100e21285a8549bb91dbab571220c639, 85d9b7bfd3cf4420ba2aae66e42ce67d, 06a214ccbb884b44bf8b2e64ba841c37, all Canada East. – Dexterity Nov 09 '18 at 16:10
  • I've replicated this exactly as the question on another server Canada East. Parent Partition Key/Execution ID/App Insights ID: ad051bbe8514493194c138f40e11ddb2, 50863b66b1ca4df183d71c897970d182, ad051bbe8514493194c138f40e11ddb2. – Dexterity Nov 09 '18 at 20:16
  • Submitted by accident here is the rest: Child Partition Key/Execution ID/App Insights ID: 50863b66b1ca4df183d71c897970d182, 19887ac47871429ca73f0de0d6552f9a, 6cadae6e-92c5-4dc5-8b23-784bb401eb18 The child orchestration has a status of completed with an output blob: c1147d6b-56fc-4a28-acd4-2137792e2f46 (81.5 mb) The parent just has status of Running. App Insights shows no movement but ~2GB committed memory. Last app insight trace was the executed of the child orchestration – Dexterity Nov 09 '18 at 20:23
  • Any update here? I've left these apps in this state for review, but I'll need to delete them soon. The first instance is still in the exact same state(Last activity 3 days ago), the second picked up where it left off 12 hours later, but then found itself in another stalled state and still is sitting there(Last activity 2 1/2 days ago). – Dexterity Nov 12 '18 at 15:44
  • I have the same problem calling a small number of activities from the parent orchestration. The parent just seems to cancel out after calling "CallSubOrchestratorAsync". All further activities don't get executed. – jhoefnagels Jul 03 '19 at 13:12

1 Answers1

0

This seems to be a bug in the durable functions framework. I faced the same issue with a Javascript orchestrator that would exit as soon as the subOrchestration finished and without executing the code after the subOrchestration. The issue seems to stem from a bug where the durable functions framework cannot retrieve the subOrchestration's output from the saved state if the subOrchestration did not have a defined instanceId. So by specifying a instanceId the code will execute fine.

My orchestrator code that was failing looked like this:

var reboot_result = yield context.df.callSubOrchestrator('reboot_orchestrator',reboot_input);
context.log('this is the next line after subOrch call which will not get called');

The context.log would never get called. So I manually specify an instanceId on the callSubOrchestrator and this fixed the issue :)

const child_id = context.df.instanceId + ":0"; //create instanceId
var reboot_result = yield context.df.callSubOrchestrator('reboot_orchestrator',reboot_input,child_id);
context.log('this is the next line after subOrch call and now it gets called properly');

Here's the link to the Github bug report: https://github.com/Azure/azure-functions-durable-js/issues/54