Questions tagged [dstream]

Discretized Streams (D-Stream) is an approach that handles streaming computations as a series of deterministic batch computations on small time intervals.

Discretized Streams (D-Stream) is an approach that handles streaming computations as a series of deterministic batch computations on small time intervals. The input data received during each interval is stored reliably across the cluster to form an input dataset for that interval. Once the time interval completes, this dataset is processed via deterministic parallel operations, such as map, reduce and groupBy, to produce new datasets representing program outputs or intermediate state

109 questions

vote

0 answers

Reg: Parallelizing RDD partitions in Spark executors

I am new to spark and trying out a sample Spark Kafka Integration. What I have done is posted jsons from single partitioned…

java apache-spark dstream

asked Jan 02 '20 at 07:30

sunnydev

vote

1 answer

Spark's socket text stream is empty

I am following Spark's streaming guide. Instead of using nc -lk 9999, I have created my own simple Python server as follows. As can be seen from the code below, it will randomly generate the letters a through z. import socketserver import time from…

python apache-spark spark-structured-streaming socketserver dstream

asked Oct 31 '19 at 08:49

Jane Wayne

8,205
17
75
120

vote

2 answers

Constructing window based on message timestamps in Spark DStream

I'm receiving DStream from Kafka and I want to group all messages in some sliding window by keys. The point is that this window need to be based on timestamps provided in each message (separate field): Message…

apache-spark sliding-window dstream

asked Jun 15 '19 at 19:15

Developer87

2,448
4
23
43

vote

1 answer

Kafka - Spark Streaming Integration: DStreams and Task reuse

I am trying to understand the internals of Spark Streaming (not Structured Streaming), specifically the way tasks see the DStream. I am going over the source code of Spark in scala, here. I understand the call stack: ExecutorCoarseGrainedBackend…

apache-spark apache-kafka spark-streaming spark-streaming-kafka dstream

asked May 12 '19 at 18:12

Sheel Pancholi

vote

1 answer

Spark QueueStream never exhausted

Puzzled on a piece of code I borrowed from the internet for research purposes. This is the code: import org.apache.spark.sql.SparkSession import org.apache.spark.rdd.RDD import org.apache.spark.streaming.{Seconds, StreamingContext} import…

apache-spark dstream

asked Jan 04 '19 at 10:50

thebluephantom

16,458
8
40
83

vote

1 answer

Join Dstream[Document] and Rdd by key Spark Scala

Here is my code: ssc =streamingcontext(sparkcontext,Seconds(time)) spark = sparksession.builder.config(properties).getorcreate() val Dstream1: ReceiverInputDstream[Document] = ssc.receiverStream(properties) // Dstream1 has Id1 and other…

apache-spark spark-streaming rdd apache-spark-dataset dstream

asked Sep 01 '18 at 01:41

Chethan

vote

1 answer

Using Map in PySpark to parse and assign column names

Here is what I am trying to do. The input data looks like this(Tab seperated): 12/01/2018 user1 123.123.222.111 23.3s 12/01/2018 user2 123.123.222.116 21.1s The data is coming in through Kafka and is being parsed with the following…

dictionary pyspark rdd flatmap dstream

asked Aug 14 '18 at 12:54

steven

vote

1 answer

How to merge multiple DStreams in spark using scala?

I have three incoming streams from Kafka. I parse the streams received as JSON and extract them to appropriate case classes and form DStreams of the following schema: case class Class1(incident_id: String, crt_object_id: String, …

scala apache-spark apache-kafka spark-streaming dstream

asked Jan 26 '18 at 20:46

Jagrati Gogia

vote

1 answer

pyspark: train kmeans streaming with data retrieved from kafka

I want to train a streaming kmeans model with data consumed from a kafka topic. My problem is how to present the data for kmeans streamig model sc = SparkContext(appName="PythonStreamingKafka") ssc = StreamingContext(sc, 30) zkQuorum, topic =…

python pyspark spark-streaming rdd dstream

asked Jul 20 '17 at 16:02

severine

vote

1 answer

Apache Spark streaming - Timeout long-running batch

I'm setting up a Apache Spark long-running streaming job to perform (non-parallelized) streaming using InputDStream. What I'm trying to achieve is that when a batch on the queue takes too long (based on a user defined timeout), I want to be able to…

apache-spark timeout streaming spark-streaming dstream

asked Jul 07 '17 at 17:42

Adam Taché

vote

1 answer

Not able to persist the DStream for use in next batch

JavaRDD history_ = sc.emptyRDD(); java.util.Queue > queue = new LinkedList>(); queue.add(history_); JavaDStream history_dstream = ssc.queueStream(queue); JavaPairDStream>…

apache-kafka spark-streaming dstream

asked Jun 08 '17 at 11:37

JSR29

vote

1 answer

Scala Spark : trying to avoid type erasure when using overload

I'm relatively new to Scala/Spark I'm trying to overload one function depending on the class type into a DStream def persist(service1DStream: DStream[Service1]): Unit = {...} def persist(service2DStream: DStream[Service2]): Unit = {...} I'm getting…

scala apache-spark overloading type-erasure dstream

asked May 19 '17 at 10:17

Fares

vote

1 answer

Scala - Spark Dstream operation similar to Cbind in R

1) I am trying to use MLlib Random Forest . my final output should have 2 columns id, predicted_value 1, 0.5 2, 0.4 my feature sets are training data and scoring --- train , score but when I train and score I drop the id field as it could…

scala apache-spark dstream

asked Jan 07 '17 at 00:13

user3757805

vote

1 answer

Spark streaming JavaPairDStream to text file

I am quite new on Spark streaming, and I am stuck saving my output. My question is, how can I save the output of my JavaPairDStream in a text file, which is updated for each file only with the elements inside the DStream? For example, with the…

java apache-spark spark-streaming rdd dstream

asked Dec 07 '16 at 17:22

Luis_MG

vote

1 answer

Spark streaming reduce by multiple key Java

I am quite new on Spark Streaming and I am getting stuck trying to figure out how to handle this problem since I found a lot of examples for single (K,V) pairs but anything further. I would appreciate some help in order to find the best approach…

java apache-spark streaming dstream

asked Nov 21 '16 at 15:51

Luis_MG

Prev 1 2

4 5 6 7 8 Next