I'm currently brainstorming an idea and trying to figure out what are the missing pieces or a better way to solve this problem.
Assume I have a product that customers can embed on their website. My end goal is to build a dashboard on my website showing relevant analytics (such as page load, click, custom events) to my customer.
I separated this feature into 2 parts:
collection of data
We can collect data from 2 sources:
Embed of https://my-bucket/$customerId/product.json
- CloudFront Logs -> S3 -> Kinesis Data Streams -> Kinesis Data Firehose -> S3
Http request POST /collect to collect an event
- ApiGateway end point -> Lambda -> Kinesis Data Firehose -> S3
access of data
My dashboard will be calling GET /analytics?event=click&from=...&to=...&productId=...
The first part is straight forward:
- ApiGateWay route -> Lambda
The struggling part: How can I have my Lambda accessing data at the moment stored on S3?
So far, I have evaluated this options:
- S3 Glue -> Athena: Athena is not a high availability service. To my understand, some requests could take minutes to execute. I need something that is fast and snappy.
- Kinesis Data Firehose -> DynamoDB: It is difficult to filter and sort on DynamoDB. I'm afraid that the high volume of analytics will slow it down and make it unpractical.
- QuickSight: It doesn't expose an SQL way to get data
- Kinesis Analytics: It doesn't expose an SQL way to get data
- Amazon OpenSearch Service: Feels overkill (?)
- Redshift: Looking into it next
I'm most probably misnaming what I'm trying to do as I can't seem to find any relevant help to solve this problem I would think must be quite common.