Highest Voted 'spark-streaming-kafka' Questions

0

votes

1 answer

How can it be possible? duplicate records in Kafka queue?

I'm using Apache Nifi and Spark and Kafka to send messages between them. First of all, I take data with Nifi and I send it to Spark to process it. Then, I send data from Spark to Nifi again to insert it in a DB. My problem is Each time I run Spark,…

asked Feb 13 '20 at 16:24

Krakenudo

182
1
17

0

votes

0 answers

Kafkastream is not listening on input when created through SparkSession Builder

I am trying to create a Kafka Consumer which uses MongoDB-Spark-Connector in the same program. Something like Kafka input as RDD --> to Dataframe and then store it in the MongoDB for later use. My Producer is up and running and the "standard"…

mongodb apache-spark apache-kafka spark-streaming-kafka

asked Feb 06 '20 at 19:06

Ranger

75
7

0

votes

2 answers

What are drawbacks of Spark Kafka Integration on local machine for real time twitter streaming analysis?

I am using Spark-Kafka Integration for working on my project which is to find top trending hashtags on twitter. For this, i am using Kafka for pushing tweets through tweepy Streaming and on the consumer side i am using Spark Streaming for DStream…

apache-spark apache-kafka spark-streaming twitter-streaming-api spark-streaming-kafka

asked Jan 05 '20 at 17:34

Dharmesh Singh

107
2
11

0

votes

1 answer

Spark Structured Streaming with Kafka source, change number of topic partitions while query is running

I've set up a Spark structured streaming query that reads from a Kafka topic. If the number of partitions in the topic is changed while the Spark query is running, Spark does not seem to notice and data on new partitions is not consumed. Is there a…

apache-spark apache-kafka partitioning spark-structured-streaming spark-streaming-kafka

asked Nov 08 '19 at 00:57

redsk

261
6
11

0

votes

1 answer

Spark - Kudu predicate pushdown

I'm using kudu and spark streaming for a realtime dashboard, my problem is that when I'm joining the batch from spark streaming with kudu table it doesn't make a predicate pushdown on it and takes 2-3 seconds to fetch the entire table in spark and…

apache-spark spark-streaming spark-streaming-kafka apache-kudu

asked Oct 29 '19 at 21:42

M. Alexandru

614
5
20

0

votes

1 answer

Opinion: Querying databases from Spark streaming or Structured streaming tasks

We have a Spark streaming use case where we need to compute some metrics from ingested events (in Kafka), but the computations require additional metadata which are not present in the events. The obvious design pattern I can think of is to make…

apache-spark spark-streaming spark-structured-streaming spark-streaming-kafka

asked Oct 25 '19 at 18:30

AbhinavChoudhury

1,167
1
18
38

0

votes

0 answers

Spark structured streaming with kafka throwing error after running for a while

I am observing weired behaviour while running spark structured streaming program. I am using S3 bucket for metadata checkpointing. The kafka topic has 310 partitions. When i start streaming job for the first time, after completion of every batch…

scala apache-spark apache-kafka spark-streaming spark-streaming-kafka

asked Oct 07 '19 at 15:33

unknown_k

11
2

0

votes

1 answer

How To Be Sure All Documents Written To Elasticsearch Integration Using Elasticsearch-Hadoop Connector In Spark Streaming

I am writing DStream to Elasticsearch using Elasticsearch-Hadoop connector. It's the link you can find the connector https://www.elastic.co/guide/en/elasticsearch/hadoop/5.6/spark.html I need to process the window, write all the documents to ES…

apache-spark elasticsearch spark-streaming spark-streaming-kafka elasticsearch-hadoop

asked Sep 27 '19 at 13:46

Yılmaz

185
2
14

0

votes

0 answers

Difficulties translating Scala Spark-Streaming code to Pyspark

I am trying to translate the Spark implementation to Pyspark, which is discussed in this blog: https://dorianbg.wordpress.com/2017/11/11/building-the-speed-layer-of-lambda-architecture-using-structured-spark-streaming/ However, I am having a lot of…

apache-spark pyspark apache-kafka spark-structured-streaming spark-streaming-kafka

asked Sep 25 '19 at 22:43

Nelson Fleig

111
1
7

0

votes

0 answers

how do we plan the outage for kafka cluster when we are using kafka for spark streaming

If due to some reason we need to bring the kafka cluster down and still source continue to produce data then there will be loss of data, how do we plan the outage for kafka cluster. Please advise how we can handle this scenario? I have tried…

apache-kafka spark-streaming spark-streaming-kafka

asked Sep 24 '19 at 13:45

jin

11
4

0

votes

0 answers

Spark kafka streaming failing to determine position of partition

I am creating a Spark streaming application with Kafka. val kafkaParams = Map[String,Object]( ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> kafkaConfig.bootstrapServers, ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG ->…

apache-spark apache-kafka spark-streaming kafka-consumer-api spark-streaming-kafka

asked Aug 27 '19 at 04:31

bitan

444
4
14

0

votes

1 answer

Spark Structured streaming watermark corresponding to deviceid

Incoming data is stream like below consist of 3 columns [ system -> deviceId, time -> eventTime value -> some metric ] +-------+-------------------+-----+ |system |time …

apache-spark spark-streaming spark-structured-streaming spark-streaming-kafka

asked Aug 22 '19 at 04:45

Rahul Shukla

505
7
20

0

votes

1 answer

How to set streaming app checkpointing to Azure storage?

I am trying set checkpointing for spark streaming application to Azure storage. I was using S3 and the code was working fine. Here is the latest code of how I set checkpointing to Azure. sc.hadoopConfiguration .set("fs.azure",…

scala azure apache-spark spark-streaming spark-streaming-kafka

asked Aug 20 '19 at 18:06

MarkZ

29
9

0

votes

1 answer

AbstractMethodError upon creation of new StreamingContext

I've been having problems trying to instantiante a new StreamingContext of Spark Streaming. I'm trying to create a new StreamingContext, but an error of AbstractMethodError is being thrown. I've been debugging the stack trace and found out that…

scala apache-spark spark-streaming spark-streaming-kafka

asked Jul 31 '19 at 20:51

Luan Araldi

31
1
9

0

votes

1 answer

Unauthorized error setting batch-bigtable as data host from Spark streaming

I'm following the example here for writing to Cloud Bigtable from Spark Streaming: https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/spark-streaming In my instance, I'm consuming from Kafka, doing some transformations,…

apache-spark google-cloud-platform spark-streaming google-cloud-bigtable spark-streaming-kafka

asked Jul 26 '19 at 13:45

j.r.e.

11
4

Questions tagged [spark-streaming-kafka]