0

We would like to stream data directly from EC2 web server to RedShift. Do I need to use Kinesis? What is the best practice? I do not plan to do any special analysis before the storage on this data. I would like a cost effective solution (it might be costly to use DynamoDB as a temporary storage before loading).

poiuytrez
  • 21,330
  • 35
  • 113
  • 172

3 Answers3

1

If cost is your primary concern than the exact number of records/second combined with the record sizes can be important.

If you are talking very low volume of messages a custom app running on a t2.micro instance to aggregate the data is about as cheap as you can go, but it won't scale. The bigger downside is that you are responsible for monitoring, maintaining, and managing that EC2 instance.

The modern approach would be to use a combination of Kinesis + Lambda + S3 + Redshift to have the data stream in requiring no EC2 instances to mange!

The approach is described in this blog post: A Zero-Administration Amazon Redshift Database Loader

What that blog post doesn't mention is now with API Gateway if you do need to do any type of custom authentication or data transformation you can do that without needing an EC2 instance by using Lambda to broker the data into Kinesis.

This would look like:

API Gateway -> Lambda -> Kinesis -> Lambda -> S3 -> Redshift

JaredHatfield
  • 6,381
  • 2
  • 29
  • 32
0

Redshift is best suited for batch loading using the COPY command. A typical pattern is to load data to either DynamoDB, S3, or Kinesis, then aggregate the events before using COPY to Redshift.

See also this useful SO Q&A.

Community
  • 1
  • 1
Ben Whaley
  • 32,811
  • 7
  • 87
  • 85
0

I implemented a such system last year inside my company using Kinesis and Kinesis connector. Kinesis connector is just a standalone app released by AWS we are running in a bunch of ElasticBeanStalk servers as Kinesis consumers, then the connector will aggregate messages to S3 every a while or every amount of messages, then it will trigger the COPY command from Redshift to load data into Redshift periodically. Since it's running on EBS, you can tune the auto-scaling conditions to make sure the cluster grows and shrinks with the volume of data from Kinesis stream.

BTW, AWS just announced Kinesis Firehose yesterday. I haven't played it but it definitely looks like a managed version of the Kinesis connector.

piggybox
  • 1,689
  • 1
  • 15
  • 19