0

I want to do a multipart upload from Kinesis into my Amazon S3 bucket. As Amazon S3 is a file system, for every entry it will create a file under the given bucket name.

My Amazon S3 feeds the AWS glue jobs as it triggers the Lambda function as soon as there is a new file in that particular folder in Amazon S3. With stream data there will be multiple files per second.

How can I control the size of the file on the Kinesis side so that Kinesis only pushes data on an Amazon S3 bucket after the certain threshold is reached? So that I trigger my job whenever I reach that size.

James Z
  • 12,209
  • 10
  • 24
  • 44
addytheriot
  • 31
  • 1
  • 6
  • For Kinesis Data Firehouse you can configure the buffer size and buffer interval. For S3 as a distination you can set the buffer size up to 128MB before the data is delivered to S3 for example. – Kevin Horgan Jun 02 '21 at 17:31
  • @KevinHorgan Aah I see, so in that case the Kineses Will hold the data in its own memory just like kafka will hold it in its topic ? and then it full flush it as a single event on a Amazon S3 Bucket? – addytheriot Jun 02 '21 at 17:37

1 Answers1

0

You can use AWS Glue Triggers instead of lambda.

For example, set a cron time for the Glue.

panda0
  • 192
  • 1
  • 18
  • This was before putting data into s3 , I think kinesis Firehose will be more suitable. I would love to use Glue jobs later after I have my data into the s3. Also I have to still go over the Glue thing. Will have a more clear picture once I know about it. – addytheriot Jun 06 '21 at 07:33