I have one data analytics requirement on AWS. I have limited knowledge on Big Data processing, but based on my
analysis, I have figured out some options.
The requirement is to collect data by calling a Provider API every 30 mins. (data ingestion)
The data is mainly structured.
This data need to be stored in a storage (S3 data lake or Red Shift.. not sure)and various aggregations/dimensions from this data are to be provided through a REST API.
There is a future requirement to run ML algorithms on the original data and hence the storage need to be decided accordingly. So based on this, can you suggest:
- How to ingest data (Lambda to run at a scheduled interval and pull data, store in the storage OR any better way to pull data in AWS)
- How to store (store in S3 or RedShift)
- Data Analytics (currently some monthly, weekly aggregations), what tools can be used? What tools to use if I am storing data in S3.
- Expose the analytics results through an API. (Hope I can use Lambda to query the Analytics engine in the previous step)