Sharing data among Microservices‏

Question

I'm seeking an answer to a design question that I didn't find an answer to in any literature on this matter. Allow me to explain the use case, my solution to it and, ask for your opinion as a subject matter expert.

Use Case: We've several Microservices that all return some form of content from different business domains. We're using Spring Cloud Netflix, so a gateway service routes traffic to the content services. Some, if not all, of these services require data that is derived from the request, and is immutable. A trivial example is locale, although there are other proprietary information too.

Solution: I'm currently deriving the shared data in the gateway service and persisting as JSON in a NoSQL database with a unique key. Then I'm adding the key as a request header before routing the request. I've a shared library that the content services include at build time, and includes a Spring bean that reads the key from the request header, fetches the stored data using the key and initializes itself. This makes it possible for the content services to access the shared data (by simply injecting the previously mentioned bean) without knowing the underlying mechanism. If a content service invokes another one, it's responsible for adding the unique key as a request header.

Debate: The debate I've with my colleagues is that whether using a shared datastore for this purpose is appropriate. I contend that it is bad for a service to leak it's domain specific data to others, but the data in question isn't domain specific, so there's nothing wrong with having a shared database and passing the key around. The alternative would be to pass all the shared data around which I consider redundant.

What is your thought?

I would say that one service should own and manage a data store. The data store itself should not be shared. — duffymo, Jul 10 '16 at 23:29
@duffymo What about data that's the outcome of authentication? Do you think every service should be parsing a JWT token to extract the same information or it should be done once and stored for later use? I've no problem creating an abstraction for the shared data, which is what the Spring bean in my question is doing. — Abhijit Sarkar, Jul 11 '16 at 01:21
I believe the VTC was related to this being posted on StackOverflow. Design arguments [are a better fit](http://meta.stackexchange.com/questions/124867/where-should-i-ask-software-architecture-design-questions) for [Programmers SE](http://programmers.stackexchange.com/). — Mitch, Jul 11 '16 at 02:06
@Mitch Is there a way to just move the question to Programmers SE or do I've to close this one and repost there? — Abhijit Sarkar, Jul 11 '16 at 02:10

score 2 · Answer 1 · edited May 23 '17 at 12:18

After the heated debate on the first answer, let me lend some perspective:

One use case that often comes up is how to handle for example authentication information after the request hit the first service which then in turn calls other services. Now the question usually is: do I hand over the authentication-information (like usernames and groups etc.) or do I just hand over the token, that the client sent and let the next service query the authentication information again.

As far as I can tell, the microservice community has not yet agreed upon an "idiomatic" way of solving this problem. I think there is a good reason for that and it lays in the different requirements that various application pose in this subject. Sometimes authentication can only be necessary at the first service that gets hit with an external request - then don't bother putting too much work into authentication. Still most systems I know have higher demands and thus require another level of sophistication on the subject of authentication.

Let me give you my view of how this problem could be solved: The easiest way is to hand around the access-token the client has sent between the back-end services. Yes - this approach requires every service to re-inquire the user-information every time it gets hit with a request. If (and I hoe this does not happen in this amount in your system) there are 25 cross-service calls per request - this most likely means 25 hits on some kind of authentication service. Most people will now start screaming in terror of this horrible duplication - but let's think the other way: If the same system were a well-structured monolith you'd still make these calls (probably hit a DB every single time) at different places in your process. The big deal about these calls in a microservice architecture is the network overhead, and it's true - it will kill you if done wrong! I will give you the solution we took and that worked well under heavy loads for us:

We developed a token service (which we'll be open-sourcing quite soon). This service does nothing else except store a combination of the token, it's expiration date and some schema-less JSON content. It has a very simple REST interface that lets you create, invalidate, extend and read tokens and their content. This service has multiple back-ends that can be configured according to the environment it run in. For development purposes it has a simple in-memory storage that is not synchronized, persisted or replicated in any way. For production environment we wrote a back-end that synchronizes these tokens between multiple instances (including all the stuff like quorums, asynchronous persistence etc.). This back-end enables us to scale this service very well; which is a premise for the solution I'm proposing: If every service nodes has to get the information associated with a token every time it receives a request - the service that provides it has to be really fast! Our implementation return tokens and their information in far less than 5 milliseconds and we're confident we can push this metric down even further.

The other strategy we have is to orchestrate services that make heavier queries to the token-service (receiving the content is expensive compared to just checking a tokens validity/existence) so that their located on the same physical nodes or close by to keep network latency to a minimum.

What is the more general message: Do not be afraid of cross-service calls as long as the number or these calls stay uncoupled from the amount of content that is handled (bad example here). The services that are called more frequently need to be engineered much more carefully and their performance needs to be very optimized to have off the last possible millisecond. DB-Hits in this kind of system-critical services for example are and absolute Nogo - but there are design patterns and architectures that can help you avoid them!!

You may have detected already that I did not directly answer your question to debate. Why? I'm vehemently against having shared databases between services. Even if these data-bases are schema-less you will couple two services together without this dependency being visible. Once you decide to restructure your data in a token-service and there is another service even just reading on this database - you just screwed up two services an you might just realize it when it's too late because the dependency is not transparent. State/Data in services should only be accessed through well-defined interfaces so they can be well abstracted, developed and tested independently. In my opinion, changing the persistence technology or structure in one service should never screw-up or even require changes in another service. Exclusively accessing a service through it's API gives you the possibility to refactor, rebuild or even completely rewrite services without necessarily breaking other services relying on it. It's called decoupling!

Let me know whether this is helpful or not!

You make some good points which I mostly agree with. The trick is how to bell the cat. You can have a service with well defined interface but in order to invoke it, you need input parameters. In my question, the well defined interface is the Spring bean (nothing says all interfaces have to be HTTP) that fetches the data from the shared DB. It needs the key to do so, and the key is passed from service to service using a request header. — Abhijit Sarkar, Jul 12 '16 at 19:42

score 0 · Answer 2 · answered Jul 11 '16 at 00:11

In most cases, the cost of duplication is offset by the convenience, but you can always consider the shared data as owned by another service.

If there is only one writer to the "shared" data and you access it in a way that allows independent versioning of clients, then you can view the shared data as an unconventionally exposed service.

Example:

Service A owns entity A1, stored in an SQL Server as table A1
Service B owns entity B1, which requires data from A1 entities

In a classical layout, Service B would access A1 through calls to Service A.

Service B --HTTP--> Service A --SQL--> A1

Alternatively, Service A may create a view that allows Service B to access A1 directly.

Service B --SQL--> vwA1_version1 --SQL--> A1

When Service A changes field layout, Service A updates vwA1_version1 to allow backwards compatibility with old clients and defines vwA1_version2 for new clients.

Please rephrase your answer specific to my question. A vaguely general discussion of shared datastore is not helpful. — Abhijit Sarkar, Jul 11 '16 at 01:15
@AbhijitSarkar, You did not ask a specific question, you asked a general one. To the question "can you have shared datastores in SOA", my response stands. If you would like a specific answer to "should I pass a JWT or a user ID", I would be in vehement support for a token. AAA is common enough that most services will have to access the data regardless. Validating a signature is much cheaper than a DB call. — Mitch, Jul 11 '16 at 02:01

Sharing data among Microservices‏

2 Answers2