Highest Voted 'spark-streaming' Questions

2

votes

1 answer

How to load JSON(path saved in csv) with Spark?

I am new to Spark. I can load the .json file in Spark. What if there are thousands of .json files in a folder. picture of .json files in the folder And I have a csv file, which classifies the .json files with labels.picture of csv file What should I…

asked Jun 20 '16 at 16:46

Fengyu

35
2
6

2

votes

1 answer

How to figure out if DStream is empty

I have 2 inputs, where first input is stream (say input1) and the second one is batch (say input2). I want to figure out if the keys in first input matches single row or more than one row in the second input. The further transformations/logic…

apache-spark spark-streaming

asked Jun 20 '16 at 11:52

Dazzler

807
9
11

2

votes

1 answer

How to map key/value partitions in paralell in Spark Streaming

I have a Spark Streaming program running in local mode in which I receive JSON messages from a TCP socket connection, several per batch interval. Each of these messages has an ID, which I use to create a key/value JavaPairDStream, such that in each…

apache-spark parallel-processing spark-streaming

asked Jun 19 '16 at 13:24

manuel mourato

801
1
12
36

2

votes

2 answers

Multiple consumers exactly-once processing with Apache Spark Streaming

I am looking to process elements on a queue (Kafka or Amazon Kinesis) and to have multiple operations to be performed on each element, for example: Write that to HDFS cluster Invoke a rest API Trigger a notification on slack. On each of these…

apache-spark spark-streaming

asked Jun 18 '16 at 10:00

Edmondo

19,559
13
62
115

2

votes

1 answer

Execute multiple actions parallel/async in Spark Streaming

is there a way to execute multiple actions async/parallel in spark streaming? Here is my code: positions.foreachRDD(rdd -> { JavaRDD pbv = rdd.map(p -> A.create(p)); javaFunctions(pbv).writerBuilder("poc",…

apache-spark cassandra spark-streaming

asked Jun 16 '16 at 06:26

mananana

393
3
15

2

votes

0 answers

spark streaming application - deployment best practices

I am using spark-submit cluster mode deployment for my application to run it in production. But this places a requirement of having the jars in the same path in all the nodes and also the config file which is passed as argument in the same path. I…

apache-spark spark-streaming

asked Jun 15 '16 at 17:44

Knight71

2,927
5
37
63

2

votes

2 answers

How to add jar using HiveContext in the spark job

I am trying to add JSONSerDe jar file to in order to access the json data load the JSON data to hive table from the spark job. My code is shown below: SparkConf sparkConf = new SparkConf().setAppName("KafkaStreamToHbase"); JavaSparkContext…

apache-spark spark-streaming apache-spark-sql

asked Jun 14 '16 at 14:22

Bhaskar

271
7
20

2

votes

1 answer

Dstream Runtime Creation/Destruction

Can Dstream with new names be created and older dstream be destroyed on runtime? //Read the Dstream inputDstream = ssc.textFileStream("./myPath/") Example: I am reading a file called cvd_filter.txt in which every single line contains a string…

apache-spark spark-streaming

asked Jun 13 '16 at 22:01

vkb

458
1
7
18

2

votes

0 answers

Could not read until the end sequence number of the range

I have a spark streaming application which reads data from Kinesis, processes it and sends the result into Elasticsearch. It was working fine But it suddenly started throwing the following error while reading data from…

apache-spark spark-streaming amazon-kinesis

asked Jun 13 '16 at 10:04

SMN

46
5

2

votes

0 answers

Spark Streaming variable in UpdateStateByKey not changing value after restarting application from checkpoint

I'm currently working in Python building a moderately complex application that relies on stateful data from multiple sources. With Pyspark I've run into an issue where a global variable used within an updateStateByKey function isn't being assigned…

apache-spark pyspark spark-streaming

asked Jun 09 '16 at 22:42

JoeP

21
3

2

votes

1 answer

Spark Streaming -> DStream.checkpoint versus SparkStreaming.checkpoint

I have Spark 1.4 Streaming application, which reads data from Kafka, uses statefull transformation, and has batch interval of 15 seconds. In order to use statefull transformations, as well as recover from driver failures, I need to set checkpointing…

apache-spark spark-streaming

asked Jun 09 '16 at 09:04

Srdjan Nikitovic

853
2
9
19

2

votes

1 answer

Caching DStream in Spark Streaming

I have a Spark streaming process which reads data from kafka, into a DStream. In my pipeline I do two times (one after another): DStream.foreachRDD( transformations on RDD and inserting into destination). (each time I do different processing and…

apache-spark spark-streaming kafka-consumer-api

asked Jun 07 '16 at 16:20

Srdjan Nikitovic

853
2
9
19

2

votes

0 answers

Spark akka throws a java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc

I have built spark using scala 2.11. I ran the following steps : ./dev/change-scala-version.sh 2.11 mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package After building spark successfully, I tried to intialize spark via akka model . So,…

java apache-spark akka spark-streaming

asked Jun 07 '16 at 12:35

Raveesh Sharma

1,486
5
21
38

2

votes

0 answers

Spark streaming reliable receiver and BlockGenerator?

As I understand, when implementing a reliable receiver for spark streaming, the block generation needs to be taken care of in the custom receiver. Is this as easy as collecting some events to some kind of queue and then storing the iterator? Or…

java apache-spark spark-streaming receiver

asked Jun 07 '16 at 09:25

Sunny

605
10
35

2

votes

0 answers

Spark Streaming Task Distribution

I have one spark streaming program that uses updateStateByKey. When I run it on a cluster with 3 machines, all updateStateByKey tasks (these are heavy tasks) run on one machine. This results in a scheduling delay on inputs, while other machines have…

apache-spark spark-streaming

asked Jun 05 '16 at 13:30

Majid Hajibaba

3,105
6
23
55

Questions tagged [spark-streaming]