Highest Voted 'spark-streaming' Questions

2

votes

0 answers

Spark Streaming with large messages java.lang.OutOfMemoryError: Java heap space

I am using Spark Streaming 1.6.1 with Kafka0.9.0.1 (createStreams API) HDP 2.4.2, My use case sends large messages to Kafka Topics ranges from 5MB to 30 MB in such cases Spark Streaming fails to complete its job and crashes with below exception.I am…

asked Nov 08 '16 at 14:22

nilesh1212

1,561
2
26
60

2

votes

2 answers

DStream checkpointing has been enabled but the DStreams with their functions are not serializable

I want to send DStream to Kafka , but it doesn't still work. searchWordCountsDStream.foreachRDD(rdd => rdd.foreachPartition( partitionOfRecords => { val props = new HashMap[String, Object]() …

scala spark-streaming

asked Oct 31 '16 at 06:50

Kof

65
2
5

2

votes

2 answers

Zeppelin Twitter Streaming Example Not Working

I am trying to run Twitter Streaming Example in Zeppelin. After I searched around, I added "org.apache.bahir:spark-streaming-twitter_2.11:2.0.0" into Spark Interpreter. So I can make the first part work, as in: Apache Zeppelin 0.6.1: Run Spark 2.0…

twitter spark-streaming apache-zeppelin

asked Oct 27 '16 at 23:17

user1828513

367
2
7
16

2

votes

1 answer

RDD toDF() : Erroneous Behavior

I built a SparkStreaming App that fetches content from A Kafka Queue and intends to put the data into a MySQL table after some pre-processing and structuring. I call the 'foreachRDD' method on the SparkStreamingContext. The issue that I'm facing is…

apache-spark spark-streaming apache-spark-sql

asked Oct 27 '16 at 07:21

arshellium

215
1
6
17

2

votes

0 answers

Share variable across spark stream

How can I share variables across spark streams in pyspark. I'm trying to share a datafame that holds various values for a combination of features like example platform etc. The program works once when the global variable is first initialized. It…

apache-spark spark-streaming

asked Oct 26 '16 at 22:19

dvshekar

93
11

2

votes

1 answer

Flow time stamp through streaming functions

How/is it possible to generate a random number or obtain system time for each time a batch is run with Spark Streaming? I have two functions which process a batch of messages: 1 - First processes the Key, creates a file (csv) and writes headers 2 -…

java apache-spark batch-processing spark-streaming

asked Oct 25 '16 at 10:58

Ken Alton

686
1
9
21

2

votes

1 answer

Spark UI's kill is not killing Driver

I am trying to kill my spark-kafka streaming job from Spark UI. It is able to kill the application but the driver is still running. Can anyone help me with this. I am good with my other streaming jobs. only one of the streaming jobs is giving this…

apache-spark spark-streaming

asked Oct 25 '16 at 03:20

AKC

953
4
17
46

2

votes

1 answer

How to join two (or more) streams (JavaDStream) in apache spark

We have a spark streaming application that consumes Gnip compliance stream. In the old version of the API, the compliance stream was provided by one end point but now it is provided by 8 different endpoints. We could run the same spark application…

java apache-spark spark-streaming

asked Oct 24 '16 at 07:07

Fanooos

2,718
5
31
55

2

votes

1 answer

Spark Memory/worker issues & what is the correct spark configuration?

I have total of 6 nodes in my spark cluster. 5 nodes had each 4 core and 32GB ram, and one of the nodes(node 4) had 8 cores and 32GB ram. So i have total of 6 nodes - 28 cores, 192GB RAM.( i want to use half of the memory, but all cores) Planning to…

apache-spark spark-streaming

asked Oct 23 '16 at 23:19

AKC

953
4
17
46

2

votes

1 answer

Reading files dynamically from HDFS from within spark transformation functions

How can a file from HDFS be read in a spark function not using sparkContext within the function. Example: val filedata_rdd = rdd.map { x => ReadFromHDFS(x.getFilePath) } Question is how ReadFromHDFS can be implemented?Usually to read from HDFS we…

apache-spark spark-streaming

asked Oct 21 '16 at 21:29

darkknight444

546
8
21

2

votes

0 answers

Reading from kafka .8.1 and writing to kafka .9.0

I have a requirement where i should read messages from kafka v.8.1 (in cluster A) and write to kafka v.9.0 (in cluster B). I am using spark streaming to read from kafka A and push messages into kafka B using spark native kafka classes. It is giving…

apache-kafka spark-streaming kafka-consumer-api kafka-producer-api

asked Oct 20 '16 at 07:55

jintocvg

158
3
14

2

votes

0 answers

Use spark-streaming as a scheduler

I have a Spark job that reads from an Oracle table into a dataframe. The way it seems the jdbc.read method works is to pull an entire table in at once, so I constructed a spark-submit job to work in batch. Whenever I have data I need manipulated I…

scala apache-spark spark-streaming

asked Oct 19 '16 at 21:34

tadamhicks

905
1
14
34

2

votes

3 answers

storing a Dataframe to a hive partition table in spark

I'm trying to store a stream of data comming in from a kafka topic into a hive partition table. I was able to convert the dstream to a dataframe and created a hive context. My code looks like this val hiveContext = new…

hadoop hive spark-streaming

asked Oct 19 '16 at 04:33

Riyan Mohammed

247
2
6
20

2

votes

1 answer

Spark Error: invalid log directory /app/spark/spark-1.6.1-bin-hadoop2.6/work/app-20161018015113-0000/3/

My spark application is failing with the above error. Actually my spark program is writing the logs to that directory. Both stderr and stdout are being written to all the workers. My program use to worik fine earlier. But yesterday i changed the…

apache-spark apache-kafka spark-streaming

asked Oct 18 '16 at 02:47

AKC

953
4
17
46

2

votes

1 answer

why spark can not recovery from checkpoint by using getOrCreate

Following offical doc, I'm trying to revovery StreamingContext: def get_or_create_ssc(): cfg = SparkConf().setAppName('MyApp').setMaster('local[10]') sc = SparkContext(conf=cfg) ssc = StreamingContext(sparkContext=sc, batchDuration=2) …

apache-spark pyspark spark-streaming

asked Oct 17 '16 at 07:14

Zhang Tong

4,569
3
19
38

Questions tagged [spark-streaming]