4

I want to run a simple .net 6 c# consumption azure function app (not durable) every minute, but I need to remember the state from the previous run.

The state consists of arrays of json serializable objects, and a few access token strings.

So I create a durable function entity like this:

[JsonObject(MemberSerialization.OptIn)]
public class DurableStorageEntity
{
    [JsonProperty("simpledata")]
    public string SimpleData { get; set; }

    [JsonProperty("complicateddata")]
    public List<ComplicatedObject> ComplicatedData { get; set; }

    public void SetSimpleData(string data) => this.SimpleData = data;


    public void SetComplicatedObject(List<ComplicatedObject> data) => this.ComplicatedObject = data;

    [FunctionName(nameof(DurableStorageEntity))]
    public static Task Run([EntityTrigger] IDurableEntityContext ctx)
        => ctx.DispatchAsync<DurableStorageEntity>();
}

This works great, but I am unsure how efficient this is and how much state data I can store this way. After testing I am also unsure if this is reliable when called from a regular (i.e. non durable) function app.

I can't find any clear information on this. In the MS document on Entity Functions it simply states:

Entity functions define operations for reading and updating small pieces of state, known as durable entities.

And

Entities provide a means for scaling out applications by distributing the work across many entities, each with a modestly sized state.

So this leaves me unsure how much data I can store. I will generally be storing a couple of arrays with a few hundred items each of objects made up of strings, ints and dates. If there is a better way to store state between runs then alternatives are welcome, but I do like using these entity functions so I am hoping it is a viable and reliable option.

Update 2-december:

I have been testing storing lists of objects with an 100 item array of objects (strings, int, datetime).

I am running this locally on my machine. I might have done something wrong, but occasionally it returns old data I stored previously instead of the latest data stored in the durable entity storage. I am not sure why this is happening. I can get it to happen by storing large amounts of data, but it also happens randomly at other times.

Generated 100 objects SIZE: 6 365 853 bytes Read Time: 250ms Write Time: 215ms

Generated 1000 objects SIZE: 63 658 851 bytes Read Time: 2s 659ms Write Time: 1s 811ms

Generated 10000 objects Write seemed to succeed, but on read got previous 1000 objects

Generated 10 objects Read back 10 objects as expected

Generated 2000 objects Error: still gets 10 objects

Generated 1000 objects Error: still gets 10 objects

Generated 100 objects Error: still gets 10 objects

Generated 1 object Finally got 1 object back as expected

Generated 10 objects Error: still gets 1 object for three runs, but then on fourth run of the function app suddenly gets 10 objects as it should.

This is on my local machine, win10, i7, 32Gb RAM and without debugger in VS2022.

Tarostar
  • 1,196
  • 1
  • 15
  • 27

2 Answers2

2

Yes this may be the expected result. This is not to say that enities are not reliable. They work very well and are reliable for storing data for functions but they can have a delayed update making the system work as any distributed system. Durable entities is a queue based system that work based on azure storage queues and tables. When a SignalEntityAsync for SetComplicatedObject is made it will create a message in a queue. The entity will then at some point execute SetComplicatedObject. May happen instantly or after some delay before it gets executed. Before execution the entity will be read from azure table. Then it will execute and save it back to the table (blobs will be used if entity is to big). When entity objects get bigger the read and save will take longer and therefore the entity may take longer before signal and update is performed completly.

In a Azure Function the ReadEntityState will read the data from the table (commited data). Therefore the read data may or may not be the completly updated.

From your explaination I assume the trigger looks something like this. Changed complicated object to a int list and added second read to illustrate update change.

public static int Ptr = 0;
public static int[] Sizes = new int[] { 0, 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000};
[FunctionName("Function1")]
public async Task Run([TimerTrigger("0 */1 * * * *")]TimerInfo myTimer, ILogger log, [DurableClient] IDurableEntityClient client)
{
    var entityId = new EntityId(nameof(DurableStorageEntity), "test");
    await client.SignalEntityAsync(entityId, "SetComplicatedObject", new List<int>(new int[Sizes[Ptr]]));
    await Task.Delay(500);
    var item1 = await client.ReadEntityStateAsync<DurableStorageEntity>(entityId);
    await Task.Delay(1000);
    var item2 = await client.ReadEntityStateAsync<DurableStorageEntity>(entityId);
    log.LogInformation($"End - Expected size: {Sizes[Ptr]} - Item1 size: {item1.EntityState.ComplicatedData.Count} - Item2 size: {item2.EntityState.ComplicatedData.Count}");
    Ptr++;
}

This produced the output (could be different when running different times)

End - Expected size: 0 - Item1 size: 0 - Item2 size: 0
End - Expected size: 1 - Item1 size: 1 - Item2 size: 1
End - Expected size: 10 - Item1 size: 10 - Item2 size: 10
End - Expected size: 100 - Item1 size: 100 - Item2 size: 100
End - Expected size: 1000 - Item1 size: 1000 - Item2 size: 1000
End - Expected size: 10000 - Item1 size: 10000 - Item2 size: 10000
End - Expected size: 100000 - Item1 size: 100000 - Item2 size: 100000
End - Expected size: 1000000 - Item1 size: 1000000 - Item2 size: 1000000
End - Expected size: 10000000 - Item1 size: 1000000 - Item2 size: 10000000

Notice the 'error' for the last run where Item1 was not correct but after another second Item2 got the correct item because the entity in the table was updated between the reads.

Durable entities cannot guarantee that all updates have been performed. Then you may have to switch to some other data storage solution. Sql, NoSql etc. You could also use azure table storage but without durable entities and just read and update in the table immediatly.

Bjorne
  • 1,424
  • 7
  • 13
2

Efficiency: Durable Entity uses combination of queue, table storage and blob storage. In terms of efficiency, it is slower compared to having a simple storage for state and is not recommended for performance critical application

Data Volume: Durable Entity uses table storage as primary storage for value. Table storage can store 1 MB worth of data in a single row. However durable entity and functions can handle larger chunks of data. They do this by putting bigger chunks in blob storage and maintaining a reference in table storage

So theoretically you can store GBs of data but you will be limited how much you can fit in memory of azure function.

Also blobs are slower so if u store data beyond a limit, your operations will get slower

Reliability: Once a durable entity method is invoked, it is reliable. If your timer function fails before or while invoking you may lose on reliability. In your case , since the timer gets state from entity, your operation would continue from where it stopped at your next timer schedule.

Staleness: Durable entities are like standalone actors, under the hood, it uses queuing mechanisms to call methods, and commits saved state in azure table storage.

Reading the latest state from a regular function might not be possible as you can only read the current committed value.

If you want to use Durable entity with previous state, you may want to move your entire processing inside the entity, and the timer could calls 'performOperation' method of entity.

Dependency injection is also possible in entity functions - https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-dotnet-entities#dependency-injection-in-entity-classes

Since Durable entity runs in sequential manner it will guarantee no concurrency issue and you won't face the staleness issue if your code is run inside entity

[JsonObject(MemberSerialization.OptIn)]
public class DurableStorageEntity
{
    [JsonProperty("simpledata")]
    public string SimpleData { get; set; }

    [JsonProperty("complicateddata")]
    public List<ComplicatedObject> ComplicatedData { get; set; }

    public void SetSimpleData(string data) => this.SimpleData = data;

    public async Task PerformOperation(){

       // Add logic here instead of timer
       Console.WriteLine(this.SimpleData);

       //Assignment can be inline

       this.SimpleData = x;
    }


    public void SetComplicatedObject(List<ComplicatedObject> data) => this.ComplicatedObject = data;

    [FunctionName(nameof(DurableStorageEntity))]
    public static Task Run([EntityTrigger] IDurableEntityContext ctx)
        => ctx.DispatchAsync<DurableStorageEntity>();
}

Regarding whether it is good approach or not. Durable entities are good for asynchronous processing. The queuing mechanisms helps handling high volume spikes occurring in a small timespan.

But if your use case does not involve high volume of processing requests, using any simple storage would do. Azure table storage can be a good, simple and cost-effective solution for storing state(entity is using the same under the hood)

Abbas Cyclewala
  • 549
  • 3
  • 10