2

We are currently architecting a system which should be capable of processing large amount of sensor events.

Since the requirement is to handle millions of different sensors instances, I thought the Service Fabric Actor Model would be a perfect fit. So the idea was to have one Actor which is responsible for processing events of one sensor (SensorId=ActorId).

The mapping is easy and since we only need to query the data by a specific SensorId, we have it all at one place, which enables really fast lookups.

The problem is now that (a few) sensors are sending data at rates a single actor can't handle anymore.

This is were we are stuck now, we can't hint the system and tell it to distribute load to more Actors for specific sensors like Sensor123 and Sensor567.

Is there any possibility to solve this with the virtual Actor System provided by Service Fabric?

Update 1:

I think we don't have a problem scaling a single actor. We get around 5k messages/s for one unique actor. But some sensors need a target throughput of 50-100k/s. So by design (single threaded execution) a single actor won't be able to acomplish this.

So to clarify the initial question: We are looking more or less for a way to automatically partition "some" actors.

(Of course we could create 10 actors for each sensor to partition the load. But that would make the lookups inefficient and additionally we need 10x more RAM. That doesn't seem to be justifiable because 0.5-1% of the sensors need more throughput)

coalmee
  • 1,334
  • 2
  • 16
  • 27

2 Answers2

1

I recommend investigating the following options:

  1. Scale the Cluster up / out. Having more cpu power increases throughput. Having fewer Actors per machine will help too.
  2. Use an ingress queue, like Event Hub, or create a queue inside Service Fabric. For instance, use an Actor to enqueue event in its StateManager, and a Reminder to process them in the background. This way the processing of events is decoupled from receiving them. (you will change into a model of 'eventual consistency' though)
  3. Make your Actors smaller, by dividing responsibilities into different Actor Types. This way you better distribute load across the cluster, at the cost of some latency.
LoekD
  • 11,402
  • 17
  • 27
  • Thank you very much for the inputs! But in our case these options won't help alot ;(. I updated my initial question to clarify our needs. – coalmee Apr 24 '17 at 09:44
0

I don't think it will give sufficient gains that you are asking for but have you tried testing a new Actor Type for this 'special case' sensor, that uses a less durable persistence method?

Such as StatePersistence.Volatile or StatePersistence.None? I have seen this significantly improve actor throughput, especially statePersistnce.None.

Obviously this may not suit your desired durability requirements, but it might be a quick win until you get a longer term solution.

Have to agree with @LoekD, option 3 would be your best bet. Try to subdivide responsibilities into different actors, which can then aggregate (on a recurring schedule?) and report back to a god-actor for that sensor that can handle the reporting load - once again this leads to some eventual consistency which may or may not be acceptable for your use case.

If all else fails, you could try running your cluster on bare-metal instead of VMs for a considerable perf gain.

Last resort, evaluate Erlang on bare-metal... said no .NET developer ever

Oliver Tomlinson
  • 565
  • 1
  • 4
  • 17