Questions tagged [amazon-kinesis-firehose]

Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data to destinations

Amazon Kinesis Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift.

Firehose is part of the Amazon Kinesis streaming data family, along with Amazon Kinesis Streams. With Firehose, you do not need to write any applications or manage any resources.

You configure your data producers to send data to Firehose and it automatically delivers the data to the destination that you specified.

Links:

  1. Amazon Kinesis - Site

  2. Amazon Kinesis - Documentation

618 questions
7
votes
1 answer

How to define nested array to ingest data and convert?

I am using Firehose and Glue to ingest data and convert JSON to the parquet file in S3. I was successful to achieve it with normal JSON (not nested or array). But I am failed for a nested JSON array. What I have done: the JSON structure { …
7
votes
2 answers

Backfill AWS Kinesis Firehose to Elasticsearch Service failed records

We have a firehose that sends records to an Elasticsearch Service cluster. Our cluster filled up and some records failed over to S3. The documentation at https://docs.aws.amazon.com/firehose/latest/dev/basic-deliver.html#retry indicates that failed…
ima747
  • 4,667
  • 3
  • 36
  • 46
7
votes
2 answers

How to change column names of autodetected partitions created by Glue Crawler?

I have bucket which is used as destination for a Kinesis Firehose stream. Firehose automatically creates date-based prefixes on that bucket using the yyyy/mm/dd/HH format. Then I created a crawler that will search for data into this bucket and…
Henrique Barcelos
  • 7,670
  • 1
  • 41
  • 66
7
votes
1 answer

Pyspark: Reading JSON data file with no separator between objects

I have a kinesis firehose delivery stream that puts data to S3. However in the data file the json objects has no separator between it. So it looks something like this, { "key1" : "value1", "key2" : "value2" }{ "key1" : "value1", "key2" :…
sjishan
  • 3,392
  • 9
  • 29
  • 53
7
votes
0 answers

Firehose to S3: What happens to data after unsuccessful tries for 24 hours

From AWS documentation: Data delivery to your S3 bucket might fail for reasons such as the bucket doesn’t exist anymore, the IAM role that Kinesis Firehose assumes doesn’t have access to the bucket, network failure, or similar events. Under…
Anshum17
  • 201
  • 3
  • 8
7
votes
1 answer

AWS Firehose buffering

When writing records to an AWS Firehose which is configured with S3 as the output destination, how long is this data buffered before it is written to S3? Or is there a minimum size threshold? For example, I'm doing the following to add records: aws…
RandomUser
  • 4,140
  • 17
  • 58
  • 94
7
votes
1 answer

How to process incremental S3 files in Spark

I made the following pipeline: Task manager -> SQS -> scraper worker (my app) -> AWS Firehose -> S3 files -> Spark ->(?) Redshift. Some things I am trying to solve/improve and I would be happy for guidance: The scraper could potentially get…
Himberjack
  • 5,682
  • 18
  • 71
  • 115
7
votes
1 answer

Auto wire kinesis stream to kinesis firehose?

I'm publishing data to a kinesis stream that is processed by some consumers. I'd like the raw data published to the stream to also be stored in s3. Is it possible to auto wire a kinesis stream to a kinesis firehose or do I need to directly publish…
devshorts
  • 8,572
  • 4
  • 50
  • 73
7
votes
5 answers

CLI to put data into AWS Firehose

AWS Firehose was released today. I'm playing around with it and trying to figure out how to put data into the stream using AWS CLI. I have a simple JSON payload and the corresponding Redshift table with columns that map to the JSON attributes. I've…
n00b
  • 5,843
  • 11
  • 52
  • 82
6
votes
4 answers

Invalid Schema error in AWS Glue created via Terraform

I have a Kinesis Firehose configuration in Terraform, which reads data from Kinesis stream in JSON, converts it to Parquet using Glue and writes to S3. There is something wrong with data format conversion and I am getting the below error(with some…
6
votes
1 answer

In near real time analytics, why is Lambda-->Firehose-->S3 preferred over Lambda -->S3?

Many AWS reference architectures for serverless real-time analytics, suggest pushing processed data from Lambda to S3 through Kinesis…
6
votes
1 answer

Athena with partition projection returns no results

While doing a proof of concept for our new ETL pipeline, I figured out some problems using partition projection in AWS Athena. Created the following table in glue: CREATE EXTERNAL TABLE `test_interactions`( `id` string, `created_at` timestamp,…
6
votes
1 answer

Firehose is unable to assume role

I'm trying to use Firehose API (JS) and I keep getting the following error: "InvalidArgumentException: Firehose is unable to assume role arn:aws:iam::XXXXXXXXXX:role/NAME. Please check the role provided. I check the role and I have set my…
6
votes
1 answer

Is cross account Kinesis Firehose possible?

Account A is the application account where I created Kinesis stream and I want to create Firehose in Account B to read from Account A Kinesis stream. Is this possible? I tried to follow the steps from…
6
votes
0 answers

How to preserve order of CloudWatch log stream events when transmitting them into another system?

The story I have ECS tasks that run docker containers that produce stdout/stderr output. The tasks are configured to use the awslogs driver to send the output to CloudWatch. There is a subscription filter on the CW log group, the subscriber is a…
1 2
3
41 42