AWS Kinesis Multipart Upload to an Amazon S3 bucket

Question

I want to do a multipart upload from Kinesis into my Amazon S3 bucket. As Amazon S3 is a file system, for every entry it will create a file under the given bucket name.

My Amazon S3 feeds the AWS glue jobs as it triggers the Lambda function as soon as there is a new file in that particular folder in Amazon S3. With stream data there will be multiple files per second.

How can I control the size of the file on the Kinesis side so that Kinesis only pushes data on an Amazon S3 bucket after the certain threshold is reached? So that I trigger my job whenever I reach that size.

For Kinesis Data Firehouse you can configure the buffer size and buffer interval. For S3 as a distination you can set the buffer size up to 128MB before the data is delivered to S3 for example. — Kevin Horgan, Jun 02 '21 at 17:31
@KevinHorgan Aah I see, so in that case the Kineses Will hold the data in its own memory just like kafka will hold it in its topic ? and then it full flush it as a single event on a Amazon S3 Bucket? — addytheriot, Jun 02 '21 at 17:37

score 0 · Answer 1 · answered Jun 04 '21 at 09:37

0

You can use AWS Glue Triggers instead of lambda.

For example, set a cron time for the Glue.

answered Jun 04 '21 at 09:37

panda0

192
1
18

This was before putting data into s3 , I think kinesis Firehose will be more suitable. I would love to use Glue jobs later after I have my data into the s3. Also I have to still go over the Glue thing. Will have a more clear picture once I know about it. – addytheriot Jun 06 '21 at 07:33

AWS Kinesis Multipart Upload to an Amazon S3 bucket

1 Answers1