0

I am planning to migrate my existing cloud monolithic Restful Web API service to Service fabric in three steps. The Memory cache (in process) has been heavily used in my cloud service.

Step 1) Migrate cloud service to SF stateful service with 1 replica and single partition. The cache code is as it is. No use of Reliable collection.

Step 2) Horizontal scaling of SF Monolithic stateful service to 5 replica and single partition. Cache code is modified to use Reliable collection.

Step 3) Break down the SF monolithic service to micro services (stateless / stateful)

Is the above approach cleaner? Any recommendation.? Any drawback?

More on Step 2) Horizontal scaling of SF stateful service

  • I am not planning to use SF partitioning strategy as I could not think of uniform data distribuition in my applictaion.
  • By adding more replica and no partitioning with SF stateful service , I am just making my service more reliable (Availability) . Is my understanding correct?
  • I will modify the cache code to use Reliable collection - Dictionary. The same state data will be available in all replicas.
  • I understand that the GET can be executed on any replica , but update / write need to be executed on primary replica?
  • How can i scale my SF stateful service without partitioning ?
  • Can all of the replica including secondory listen to my client request and respond the same? GET shall be able to execute , How PUT & POST call works?

  • Should i prefer using external cache store (Redis) over Reliable collection at this step? Use Stateless service?

Ashish
  • 407
  • 3
  • 12
  • My recommendation after using with SF for a year is.. unless you have sufficient resources and a very capable team, just don't. It's way too immature as a platform and massively overkill for a lot of applications. Take things slow. Spin up two instances of your app and put them behind a load balancer. See how that fairs up, then look at moving some of your hot read data to redis. Fwiw reliable collections are more CA than P. Using SF just to make use of a distributed dictionary as a cache is a bit of a poor choice imo – Mardoxx Mar 21 '18 at 09:01
  • Another question: for what reason is your REST api stateful? – Mardoxx Mar 21 '18 at 09:07
  • Thanks Mardoxx. REST API is stateless. But we have used in-process cache (Memory cache) for hot data for data latency. You are right we are planning to use distributed dictionary as cache to store in-process cache data. – Ashish Mar 21 '18 at 10:04

1 Answers1

1

This document has a good overview of options for scaling a particular workload in Service Fabric and some examples of when you'd want to use each.

Option 2 (creating more service instances, dynamically or upfront) sounds like it would map to your workload pretty well. Whether you decide to use a custom stateful service as your cache or use an external store depends on a few things:

  • Whether you have the space in your main compute machines to store the cached data
  • Whether your service can get away with a simple cache or whether it needs more advanced features provided by other caching services
  • Whether your service needs the performance improvement of a cache in the same set of nodes as the web tier or whether it can afford to call out to a remote service in terms of latency
  • whether you can afford to pay for a caching service, or whether you want to make due with using the memory, compute, and local storage you're already paying for with the VMs.
  • whether you really want to take on building and running your own cache

To answer some of your other questions:

  • Yes, adding more replicas increases availability/reliability, not scale. In fact it can have a negative impact on performance (for writes) since changes have to be written to more replicas.
  • The state data isn't guaranteed to be the same in all replicas, just a majority of them. Some secondaries can even be ahead, which is why reading from secondaries is discouraged.
  • So to your next question, the recommendation is for all reads and writes to always be performed against the primary so that you're seeing consistent quorum committed data.
masnider
  • 2,609
  • 13
  • 20
  • "Recommendation to read and write to be always performed against primary". This will be limitation for read at-least as we need to perform inter-process call whenever we access cache object. – Ashish Mar 22 '18 at 06:17
  • If you partition your data, then you will have one primary per partition, this will increase the throughput and you can avoid secondary reads – Diego Mendes Mar 22 '18 at 08:42
  • @Diego Right, or since he doesn't want to use the partitioning (since it enforces uniform key distribution), just create more services and distribute the calls however you had in mind. There's a little more work up front, but the result is not much different. – masnider Mar 22 '18 at 16:54
  • @Ashish - regarding IPC sure you have to do that. There's technically an option to host the stateful service within the same process as your stateless service, but this is tricky to set up. I would set it up with two services and see if the latency makes a huge difference. Since you have a leader for the data, you will always have to find that leader for that segment of data and it might be elsewhere in the cluster. You -can- also read against the secondaries, but you have to be careful since you can get non-quorum-committed data from the future. – masnider Mar 22 '18 at 16:59
  • 1
    @masnider is there any configuration to define the minimum quorum for replication? For example: I have 10 replicas and wan't to make it commit once it complete at least 5 of them? – Diego Mendes Mar 22 '18 at 20:41
  • 1
    No. It is majority only today. – masnider Mar 22 '18 at 21:27
  • @masnider Ideally, SF should also provide Strong consistency along with Eventual consistency.(N/2 + 1) I know there is cost associated to latency but at the same time you get scalability with guranteed consistency. – Ashish Mar 23 '18 at 05:39
  • Common feature ask, just doesn't exist today. Folks have also done pretty well partitioning things out one way or another and doing things in parallel. There's been lots of weird bugs folks have encountered due to eventual consistency, which this helps people avoid. If you really want to go for eventual consistency today, you can batch things up "outside" the collections and then commit them (or whatever other tracking data you want), just not 1:1 with customer calls. – masnider Mar 23 '18 at 16:20