4

I am having an issue returning a large list of objects from an activity function to an orchestrator function. I have a function that downloads a 180 MB file and parses it. This file will produce a list of objects with over 962K entries. Each object has about 70 properties but only about 20% of them are populated. When I run the function, the code successfully downloads and parses the file into the list, but when the list is returned, an exception is raised with the following information:

Exception: "Exception while executing function: #######" - Source: "System.Private.CoreLib"

Inner exception: "Error while handling parameter $return after function returned." - Source: "Microsoft.Azure.WebJobs.Host"

Inner / Inner exception: "Exception of type 'System.OutOfMemoryException' was thrown." - Source: "System.Private.CoreLib"

The last nested exception lists the NewtonsoftJson package as being the one making the call that generates the out of memory error being reported. I am including the full stack trace for this exception at the end.

I understand that I could possibly serialize the list of objects and store them in an Azure blob entry and then just pick it up again in the next function that needs to process it, but I thought the idea behind durable functions was to avoid all this and maintain a leaner workflow? Also, I based the design on the "Large Message Support #26" github post that states that the durable functions extensions would automatically store the function payload in a blob if the size exceeds the queue message limit (see: https://github.com/Azure/azure-functions-durable-extension/issues/26).

Is there anything I need to do to get this working? The code is pretty simple:

[FunctionName("GetDataFromSource")]
public static IEnumerable<DataDetail> GetDataFromSource([ActivityTrigger]ISource source, ILogger logger)
{
    try
    {
        string importSettings = Environment.GetEnvironmentVariable(source.SettingsKey);
        if (string.IsNullOrWhiteSpace(importSettings))
        {
            logger.LogError($"No settings key information found for the {source.SourceId} data source");                    }
        else
        {
            List<DataDetail> _Data = source.GetVinData().Distinct().ToList();
            return vinData;
         }
     }
     catch (Exception ex)
     {
         logger.LogCritical($"Error processing the {source.SourceId} Vin data source. *** Exception: {ex}");
     }

      return new List<DataDetail>();
}

This is the stack trace for the most inner exception:

at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
   at System.Text.StringBuilder.Append(Char value, Int32 repeatCount)
   at System.Text.StringBuilder.Append(Char value)
   at System.IO.StringWriter.Write(Char value)
   at Newtonsoft.Json.JsonTextWriter.WritePropertyName(String name, Boolean escape)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeObject(JsonWriter writer, Object value, JsonObjectContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeValue(JsonWriter writer, Object value, JsonContract valueContract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeValue(JsonWriter writer, Object value, JsonContract valueContract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerProperty)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType)
   at Newtonsoft.Json.JsonSerializer.SerializeInternal(JsonWriter jsonWriter, Object value, Type objectType)
   at DurableTask.Core.Serializing.JsonDataConverter.Serialize(Object value, Boolean formatted)
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.MessagePayloadDataConverter.Serialize(Object value, Int32 maxSizeInKB) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\MessagePayloadDataConverter.cs:line 55
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.MessagePayloadDataConverter.Serialize(Object value) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\MessagePayloadDataConverter.cs:line 43
   at Microsoft.Azure.WebJobs.DurableActivityContext.SetOutput(Object output) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\DurableActivityContext.cs:line 136
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.ActivityTriggerAttributeBindingProvider.ActivityTriggerBinding.ActivityTriggerReturnValueBinder.SetValueAsync(Object value, CancellationToken cancellationToken) in C:\projects\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Bindings\ActivityTriggerAttributeBindingProvider.cs:line 213
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ParameterHelper.ProcessOutputParameters(CancellationToken cancellationToken) in C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 972

2 Answers2

1

I came across a similar issue when working with Durable Functions.

There are a couple of solutions / workarounds to this: As you say, you could store the function payload in blob storage and retrieve them when you require. This works but there is a performance hit and can take a while to retrieve depending on how big your file is.

The other option would be to batch your calls. I'm not entirely sure what your GetVinData() method does but you could modify this so you only retrieve 50,000 (or x number) items at a time. Your orchestrator could call your activity function multiple times and build up your list in the orchestrator.

[FunctionName(nameof(OrchestratorAsync))]
public async Task OrchestratorAsync([OrchestrationTrigger] IDurableOrchestrationContext context) 
{
    var dataDetailList = new List<DataDetail>();
    var batches = BuildBatchesHere();

    foreach (var batch in batches) 
    {
        dataDetailList.AddRange(
             await context.CallActivityAsync<List<DataDetail>>(
                  nameof(GetDataFromSource), batch);
    }
    // Do whatever you need with dataDetailList
}
stephyness
  • 29
  • 5
0

The Durable Functions extension will automatically take care of storing large messages in blobs when they don't fit in queues and tables. However, this support assumes that enough memory is available to serialize the payloads so that they can be uploaded to blob. Unfortunately, the design of the Durable Task Framework requires serializing the payload into a string first before uploading to blob, which means there will be a lot of memory pressure.

There are a few things you can try to mitigate this problem:

  1. Make sure your function app is running in 64-bit mode. By default, Function apps are created in 32-bit mode, which has lower memory limits. We've seen several cases where simply switching to 64-bit resolves out-of-memory issues.

  2. Try increasing the memory limit for your particular plan. If you're running in the Azure Functions Consumption plan, maximum memory is fixed. However, if you're running in Elastic Premium or App Service Plans, you have the option of using larger VMs with more memory.

  3. As @stephyness suggested, consider limiting the amount of data your return from your function. This could be returning a subset of the full list or it could be the full list but with smaller payloads (for example, source.GetVinData().Distinct().Select(x => x.VinNumber) (you might even get better results by simply removing .ToList(), which may be creating an unnecessary copy of your data). Essentially, return only the data that the orchestrator absolutely needs to make progress. Returning data the orchestrator doesn't need is unnecessary overhead.

Also be aware that there's a non-trivial performance impact when large message support is used. If you can avoid relying on it, your orchestrations will run much faster.

Other tips for controlling memory usage can be found in the Performance and Scale documentation.

Chris Gillum
  • 14,526
  • 5
  • 48
  • 61