What are some approaches to correlating/mapping entities between disparate systems using a service bus architecture?

Question

I am investigating employing a service bus architecture in our enterprise for coordinating data and business processes between systems in our environment.

Our situation is typical: customer-facing web applications that communicate messages to internal systems for billing, inventory, etc. We have several business entities that are common among most or all of these systems; each of these systems maintain their own versions of these entities in their own databases.

Applied to the service bus concept, we can publish a message to the bus from our customer web portal that represents an order. This message can be consumed by each interested system (billing, inventory, etc.) to create the corresponding customer records in their own databases. However, when one of these internal systems needs to publish a message about order status, it would carry the order ID from its own database. If our web portal needs to consume that message, it wouldn't know how to correlate the order ID sent to it from our internal system to the order ID that was saved to the web portal database.

Since no other system inherently is aware of how equivalent entities are correlated across systems, it seems that some type of mapping mechanism needs to be in place to allow systems to translate IDs contained in messages to the ID relevant to it. For example, a database table could be created that maps IDs from one system to another. This table could be queried to retrieve an appropriate ID for the target system. Our business currently does not utilize entity aggregation or some other '360-view' type repository to serve as a single authoritative source for common entity information from which a universal ID can be passed between and used by all systems.

Is using such an entity mapping approach to accompany a service bus implementation a valid approach? If so, are there any established guidelines to guide the design? If not, I am interested in hearing about alternative approaches to linking entities across systems to facilitate integration through a service bus.

PS: If it helps, I am currently evaluating the MassTransit framework for building our bus, so if there's info to offer specific to it, that is also very welcome.

score 1 · Answer 1 · answered Jan 24 '13 at 12:33

1

Without more details I would consider an approach that looks similar to something like this...

Website publishes a SubmitNewOrder message
MiddleManager service would take that SubmitNewOrder message and convert it into tasks for each system; with that conversion you could do ID translations
Each system has an endpoint that would consume the right message/command and act accordingly

So some of the things here are that instead of sending your entities around you are sending commands. Now you could be sending subsets of your entities (which is a great time to extract an interface from your entities and compose message types). This would seem like a totally valid approach to this problem from a high level.

If you want to dig into this in some more depth, Enterprise Service Bus: Theory in Practice is an awesome book to get started with. It doesn't have anything to do with MassTransit, but many of the theories apply and set a great groundwork.

answered Jan 24 '13 at 12:33

Travis

10,444
2
28
48

To make sure I understand: The SubmitNewOrder message would carry all information captured in the order from the website, which is subscribed to by only the central broker, with the intent of forking this message into a group of commands/messages/actions, one for each target system that should be interested. The broker would perform work to transform data and translate any IDs contained in the message from the source system to IDs understood by the target systems. Once this is complete, it would communicate this to each target system via a new message on the bus, or perhaps a service call. – Gary DeReese Jan 24 '13 at 13:54
Also, could utilizing a MassTransit saga that orchestrates this process, also managing data mapping and transformation, be appropriate for orchestrating this process, or would that be abuse of the feature? Details [here](http://docs.masstransit-project.com/en/latest/overview/saga.html). – Gary DeReese Jan 24 '13 at 15:42
First comment: Even if a service call was the endpoint for another system, I would put that behind another message. The Saga should always publish messages. If one of the message consumers happens to make a service call, that's just fine and dandy. But the system doesn't care, and that can be changed as needed. Secondly: That wouldn't really be an abuse. Now if there's a lot of a logic, then you need to break it up into multiple steps. Each one is executed with a message, that was the Saga doesn't own much logic, just transitions into new states. – Travis Jan 28 '13 at 12:44
Interesting idea, thanks for that. I'm still trying to settle on the best MassTransit-specific way to tackle this, so I'll pose the question to the masstransit-discuss board to see if anyone over there has any thoughts. – Gary DeReese Feb 07 '13 at 14:57

score 0 · Answer 2 · answered Jan 24 '13 at 05:38

I've done quite a bit of similar work at my current company. I come from a C# background and was new to the concept of service buses. The project I've been working on is mostly in Java, so we ended up going with Mule ESB as our solution. I would highly recommend it, but I realize it might not play out of the box as well with .NET code.

Question: How do you manage entities across systems?

We currently have two types of entities. Those that exist in one authoritative system and that are copied to other systems, and entities that are shared across systems.

When an entity comes from one system, I've found it best to create copies of that entity across systems. The service bus will extract the data, transform it, and then load it into the appropriate system. The source and destination have a fixed entity data format and the service bus does the translation. This has worked really well for us.

For shared entities it's a bit trickier. We allow for partial changes (one system modifies 5 properties/fields, another system manages another 3 properties/fields) which has worked ok so far. We've tried to minimize shared entities because of the risk of having data get inconsistent. (For example, "Why does this entity look odd? Oh yeah, some system modified the status field and I forgot it could do that.")

Using Queues

We use ActiveMQ and the concept of a topic to send data around from system to system. A topic is similar to a queue, but many subscribers can listen on a topic. For example, we have an internal CMS that produces article entities. When a user publishes an article, it goes out to the ActiveMQ topic.

Then the service bus can have many listeners that are watching the topic. Each listener gets his own copy of the entity that comes out. Then each listener can also do his own transformation and then send a copy to the appropriate system.

It has proven to be very easy to understand/code/test/maintain.

Having a mapping database

To my shame I have recently done this by hard coding values into a script. Actually I wrote a tool to generate the code for me, but it's still in a script file. The reason is that we do a lot of batch processing where we send a ton of messages onto the service bus. I don't want to pay the price to do a database lookup every time a message has to go through.

So my tool generates a script that manages the mapping/coordination of entities and it's very fast. If a change is made, I just re-run the tool and a new script is ready to go. The mapping information for me changes very, very slowly, so it's ok.

But if you can make a data store (XML, database, flat file, whatever) that has mappings of entities, I think that's fine.

General Advice

Some tips from doing a lot of service bus work:

A service bus is not a silver bullet. It works best when you have a wide variety of technologies from my experience. For example, we have MySQL, SQL Server, ActiveMQ, Mongo, CouchDb, Elastic Search, etc.
I don't know how Mass Transit works, but Mule has the concept of flows. A flow is one conceptual flow of data from point A to point B. You can have many flows. I do my best to make these very simple. It makes testing quite a bit easier. Since it's integration work, it's not as easy to debug and so I prefer to take any complexity out.
Try to get the entities as close to their destination as possible at the source. In other words, if I'm taking data from a .NET web service and writing it to SQL, I want the object to go onto the service bus 90% of the way towards full transformation. Again this keeps the service bus logic simple.

This is a pretty substantial write up. Awesome. Some MassTransit specific stuff: Instead of Flows, MassTransit has Sagas, which are state machines. In terms of "sending entities" across the pipe, what I focus on doing is having a command. `UpdateShipping` address, and it might have an ID or 2 needed for the record as well as the fields that are needed to be updated. That's really allow it will have. That's all the message should be responsible for. You can keep on sending entities around, but any change has a much larger surface area - something distributed system try to avoid. — Travis, Jan 24 '13 at 12:23
Ah. Seems like MassTransit doesn't map to the stuff I'm doing. Good luck! — ryan1234, Jan 24 '13 at 17:53
This is indeed good insight, thanks for taking the time. I am now looking for some more MassTransit-specific tips on how to best accomplish this, so I will try to keep this going on masstransit-discuss. — Gary DeReese, Feb 07 '13 at 14:55

What are some approaches to correlating/mapping entities between disparate systems using a service bus architecture?

2 Answers2