Single query to get the data from DynamoDB and RDS

Question

Looking for an advice on AWS architecture. Did some research on my own, but I'm far from an expert and I would really love to hear other opinions. This seems to be a pretty common problem for miscroservice architecture, but AWS looks like a different universe to me with its own rules (and tools), there should be best practices that I'm not aware of yet.

What we have:

SOA: Lambda per entity (usually node.js + DynamoDB)
Some Lambda functions use RDS (MySQL) as a DB (this data was supposed to be used by Quicksight)
GraphQL (AppSync)

First problem occurred when we understood that we have to display in Quicksight the data that is stored in DynamoDB. This was solved by Data Pipeline job that transfers the data from DynamoDB to S3 and then is fetched by Quicksight using Athena. In this case it's acceptable that the data for analysis is not updated in real time.

But now we need to create a table in the main application and combine the data that is stored in different data sources - DynamoDB and MySQL. For example, we have an entity payment with attributes like amount and currency, this data is stored in MySQL. And then there is a contract entity which is stored in DynamoDB. Payment can have a link to a contract (one to many relation). We need to create a table with a list of contracts, so the user can filter contracts by payments attributes like seeing the contracts that have payments in EUR or with total amount > 500 USD. This table must contain real time data and have common data grid features: filtering, sorting, pagination.

Options that I see at the moment:

use SQS to transfer payment attributes from payment service to DynamodDB and store it as a String Set in DynamoDB (e.g. column currencies: ['EUR', 'USD']).
use streams (DynamoDB streams, Kinesis?) to transfer data from DynamoDB to S3, and then query the data with Athena. Not sure it will work for us, I got really bad performance issues with Athena, queries stuck in queue for a couple of minutes, did I do something wrong?
remodel the architecture, merge entities into one DB. Probably this one will take far too long to be allowed by project managers.

Data duplication (and consistency issues as a result) was always a pain for me, but it seems to be unavoidable here.

Any thoughts or links to the articles that might help are highly appreciated.

P.S. The architecture was designed by a previous development team.

Might be useful: [Query any data source with Amazon Athena’s new federated query | AWS Big Data Blog](https://aws.amazon.com/blogs/big-data/query-any-data-source-with-amazon-athenas-new-federated-query/) — John Rotenstein, May 11 '20 at 06:32
Thanks, but it turned out that Athena is too slow to be considered as an option. [Here](https://stackoverflow.com/a/61456332) is a good explanation. — sashko, May 12 '20 at 11:47
If you have different data models then either you deal at application layer where you query both the Db which is not efficient. Another option is stream the data to build Materialized View which means you have readonly views pre-computed. https://learn.microsoft.com/en-us/azure/architecture/patterns/materialized-view . Another option is data duplication and in microservices that's not un common. — Imran Arshad, May 17 '20 at 23:51
Yep, for now we ended up with data duplication across the services using SQS. It seems to be the cheapest option. — sashko, May 19 '20 at 19:26

Single query to get the data from DynamoDB and RDS

0 Answers0