0

I'm currently brainstorming an idea and trying to figure out what are the missing pieces or a better way to solve this problem.

Assume I have a product that customers can embed on their website. My end goal is to build a dashboard on my website showing relevant analytics (such as page load, click, custom events) to my customer.

I separated this feature into 2 parts:

collection of data

We can collect data from 2 sources:

  1. Embed of https://my-bucket/$customerId/product.json

    • CloudFront Logs -> S3 -> Kinesis Data Streams -> Kinesis Data Firehose -> S3
  2. Http request POST /collect to collect an event

    • ApiGateway end point -> Lambda -> Kinesis Data Firehose -> S3

access of data

My dashboard will be calling GET /analytics?event=click&from=...&to=...&productId=...

The first part is straight forward:

  • ApiGateWay route -> Lambda

The struggling part: How can I have my Lambda accessing data at the moment stored on S3?

So far, I have evaluated this options:

  • S3 Glue -> Athena: Athena is not a high availability service. To my understand, some requests could take minutes to execute. I need something that is fast and snappy.
  • Kinesis Data Firehose -> DynamoDB: It is difficult to filter and sort on DynamoDB. I'm afraid that the high volume of analytics will slow it down and make it unpractical.
  • QuickSight: It doesn't expose an SQL way to get data
  • Kinesis Analytics: It doesn't expose an SQL way to get data
  • Amazon OpenSearch Service: Feels overkill (?)
  • Redshift: Looking into it next

I'm most probably misnaming what I'm trying to do as I can't seem to find any relevant help to solve this problem I would think must be quite common.

Daniel Costa
  • 275
  • 2
  • 14
  • How quick are you able to filter from the data you have in S3? Why not transform the data you have on S3 into a database (DynamoDB) ultimately and use that as the data-source for your dashboard? – Matthias Steinbauer Jan 23 '22 at 13:27
  • 1
    Somehow I missed this comment! It takes quite some time to filter and Athena does not guarantee it gets done right away (it could potentially wait for minutes - it generally doesn't, but it could.) We went with another approach: for reading: AWS Api Gateway end point reads from AWS RDS serverless database. For write, we have a collect endpoint to write to the database and we have a consumer on our CloudFront logs that feeds the access logs to a Lambda that then feeds the database – Daniel Costa Apr 13 '22 at 08:25

0 Answers0