What are the various alternatives to data processing in SOA. What I have done so far in PoC is:
- Scaling the Services on multiple machines.
- One universal service will handle the service registry & discovery.
- Multiple requests for one service can be forwarded to any instance of the service running on multiple machines on the cluster.
Next, we are planning introduction of a distributed caching layer. Any service can get the data from the distributed caching layer. Entire flow if the system will be:
Client will request the data from service.
Service will check the cache for the valid requested data. If data is in the valid state it will be returned to the client right away. Otherwise permanent data storage will the called for the requested data and it will flow to client by updating the cache.
Now if the client request for processing the data and it can be processed by a service. Data can be processed by single instance of the service or by multiple instances of the service 3a or 3b?
3a. We just pass the important data filters from client to service. Distribute the processing command among the multiple instances of the service. Each instance will perform operation on a small set of data& will update the data in the cache and permanent store. Here instead of passing the data we are passing processing command among the clusters.
3b. We process the whole data in one instance of the service and update it on the cache and permanent data store.
Finally we return the processed data to the client.
For the transaction system, should we depend on the distributed cache? It might result into consistency problems while data is being processed by multiple instance of the service.One instance can read stale data and process that stale copy in distributed system. how robust it will be to depend on distributed cache?
How large set of the transaction data should be processed in distributed system (SOA) ? I have been reading this line on mulesoft's site
"Share workload between applications while maintaining transient state information with in-memory data-grid to provide bulletproof reliability together with scalability"
Any pointers to achieve such a distributed system where we can have scalability and reliability?