26

I don't understand SOA (Service-oriented Architecture) and databases. While I'm attracted by the SOA concept (encapsulating reusable business logic into services) I can't figure out how it's supposed to work if data tables encapsulated in a service are required by other services/systems---or is SOA suitable at all in this scenario?

To be more concrete, suppose I have two services:

  • CustomerService: contains my Customers database table and associated business logic.
  • OrderService: contains my Orders table and logic.

Now what if I need to JOIN the Customers and Orders tables with an SQL statement? If the tables contain millions of entries, unacceptable performance would result if I have to send the data over the network using SOAP/XML. And how to perform the JOIN?

Doing a little research, I have found some proposed solutions:

  • Use replication to make a local copy of the required data where needed. But then there's no encapsulation and then what's the point of using SOA? This is discussed on StackOverflow but there's no clear consensus.
  • Set up a Master Data Service which encapsulates all database data. I guess it would get monster sized (with essentially one API call for each stored procedure) and require updates all the time. To me this seems related to the enterprise data bus concept.

If you have any input on this, please let me know.

Gruber
  • 4,478
  • 6
  • 47
  • 74
  • 9
    Your edit makes absolutely zero sense. RESTful services is a matter of API design, and having individual services actually pushes things toward SOA. So your comment about moving from SOA to REST is akin to saying moving from eating bananas to using alarm clocks. – Max Apr 29 '14 at 22:07
  • Thanks for the input, but not sure I agree with you. Since the dawn of the service-oriented architecture (SOA) evolution, SOA has been compared and contrasted with the model of the RESTful interface. [SearchSOA](http://searchsoa.techtarget.com/tip/SOA-and-RESTful-interfaces-When-why-they-should-be-combined) – Gruber 24 mins ago – Gruber Apr 30 '14 at 08:25
  • 1
    Also check this: http://martinfowler.com/articles/enterpriseREST.html#bounded-contexts – Masood Khaari Aug 04 '14 at 10:18
  • Some [good info](http://rest.elkstein.org/2008/02/roa-vs-soa-rest-vs-soap.html) about ROA (Restful Oriented Architecture) in contrast to SOA. – Gruber Sep 18 '15 at 13:36

3 Answers3

15

One of the defining principals of a "service" in this context is that it owns, absolutely, that data in the area it is responsible for, as well as operations on that data.

Copying data, through replication or any other mechanism, ditches that responsibility. Either you replicate the business rules, too, or you will eventually wind up in a situation where you wind up needing the other service updated to change your internal rules.

Using a single data service is just "don't do SOA"; if you have one single place that manages all data, you don't have independent services, you just have one service.

I would suggest, instead, the third option: use composition to put that data together, avoiding the database level JOIN operation entirely.

Instead of thinking about needing to join those two values together in the database, think about how to compose them together at the edges:

When you render an HTML page for a customer, you can supply HTML from multiple services and compose them next to each other visually: the customer details come from the customer service, and the order details from the order service.

Likewise an invoice email: compose data supplied from multiple services visually, without needing the in-database join.

This has two advantages: one, you do away with the need to join in the database, and even the need to have the data stored in the same type of database. Now each service can use whatever data store is most appropriate for their need.

Two, you can more easily change the outside of your application. If you have small, composable parts you can easily add rearrange the parts in new ways.

Daniel Pittman
  • 16,733
  • 4
  • 41
  • 34
  • When looking for clues I didn't come across your idea---it's an interesting approach. If I could do away with the need to `JOIN` the original problem wouldn't arise. However, in my case the `JOIN` seems hard to sidestep as the result of it is not intended for presentational purposes. But I'll give it some thought. Something crucial you point out is that **a service should own its own data** and associated operations. There's been disagreement over that in [other threads](http://stackoverflow.com/questions/4019902/soa-joining-data-across-multiple-services). Thanks for responding! – Gruber Nov 18 '11 at 07:39
  • 1
    Udi Dahan has a blog here: http://www.udidahan.com/ - his work has been invaluable in shaping my views on the subject, and you will find a lot more valuable information there. – Daniel Pittman Nov 19 '11 at 02:03
  • 1
    Are you suggesting that the Customer service can only access the Customer data, and the Order service can only access the Order data? I have not seen any mention of this in Thomas Erl's books? PLease could you supply a reference? – Paul T Davies Dec 01 '11 at 13:31
  • Forgot to add: @Daniel Please see above – Paul T Davies Dec 01 '11 at 18:10
  • 3
    @PaulTDavies I have nothing concrete book-wise to recommend right now, but http://msdn.microsoft.com/en-us/library/ms954587 is a solid summary of where my views come from. Specifically, see the "Data on the outside" vs "Data on the inside" discussion. – Daniel Pittman Dec 02 '11 at 03:09
  • 7
    @DanielPittman One issue with this approach: What if I want to search a joined set of results, filtering on fields in both tables. Eg: Find all orders over £100, from customers who live in Cheshire? Using the composition approach, I would have to find all orders over £100 pounds from the Order service, and all customers from Cheshire from the Customer service, and then use the composed service to find the intersection. The set of orders and customers would be much bigger than if a database JOIN was used, and performace would be vastly reduced. – Paul T Davies Dec 06 '11 at 13:37
  • @DanielPittman Could you please answer http://stackoverflow.com/questions/9483286/understanding-data-outside-of-service-soa ? – LCJ Feb 28 '12 at 14:31
2

The guiding principle is that it is ok to cache immutable data This means that simple immutable data from the customer entity can exist in the order service and there's no need to go to the customer service every time you need the info. Breaking everything to isolated services and then always making these remote procedure calls ignores the fallacies of distributed computing. If you have extensive reporting needs you need to create an additional service. I call that Aggregated Reporting service, which, again gets read-only data for reporting purposes. You can see an article I wrote about that for InfoQ a few years ago

Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68
  • I read your article. Great read. I guess caching other services' data combined with the event-driven mechanism you describe would work. For services whose data doesn't change often I think I could get away with letting services update their caches once every night, that would take away the complexity of event handling. – Gruber Nov 18 '11 at 12:09
  • 1
    You might want to check out complementary patterns like CQRS http://martinfowler.com/bliki/CQRS.html – Arnon Rotem-Gal-Oz Nov 18 '11 at 17:53
1

In the SO question you quoted, various people state that it is OK for a service to access another services data, so the Order service could have a GetAllWithCustomer functionality, which would return all the orders along with the customer details for that order.

Also, this question of mine may be helpful:

https://softwareengineering.stackexchange.com/questions/115958/is-it-bad-practice-for-services-to-share-a-database-in-soa

Community
  • 1
  • 1
Paul T Davies
  • 2,527
  • 2
  • 22
  • 39
  • Thanks. Seems like people have different views on this. If I, for example, replicate `CustomerService`'s data table `Customers` (for performance reasons) to `OrderService`, I end up with a "spaghetti" problem if the design of `Customers` needs to change---I then have to update `OrderService`'s code. And with SOA that's what I want to move away from. – Gruber Dec 02 '11 at 08:19
  • 1
    @user1035411 I don't think you should replicate the data. I just think the Order service should be allowed access to the Customer Services's data. Another thing to consider is that you may want a customer table for the Order service that would contain customer data that is relevant only to the order service. Customer Name would be used by lots of services, so that would stay in Customer. but credit limit perhaps might be a candidate for the order service? This would prevent the design of Customers from changing too often. – Paul T Davies Dec 02 '11 at 13:18
  • I want to honor the SOA principle that services only talk through their interfaces, so I never have to update other services if one is changed internally. I agree with you that services may need to store other services' data. The problem is performance and how to make cached data up-to-date. Fetching large amounts of data through SOAP/XML produces heavy network load, while replication usually is quite smooth. Since replication breaks the SOA principle, I think am going to try the other way and, as you suggest, only store a minimum amount of data. – Gruber Dec 05 '11 at 09:17
  • 1
    @user1035411 I'm not saying that a service should be allowed to STORE another servies data, but that it can have direct access to the other service's database. Some may disagree, but opion is divided on this, and in your case I don't see another valid way of doin it. – Paul T Davies Dec 05 '11 at 13:17