Highest Voted 'spark-streaming' Questions

2

votes

0 answers

Training and Prediction in Spark Streaming Machine Learning Model

I am having a hard time understanding how we can both update the machine learning model and use it to make predictions in one spark streaming job. This code is from Spark StreamingLinearRegressionExample class val trainingData =…

machine-learning spark-streaming dataflow

asked Jan 05 '17 at 13:59

Drakan

31
4

2

votes

1 answer

Spark Streaming HiveContext NullPointerException

I'm writing a Spark Streaming application using Spark 1.6.0 on a CDH 5.8.3 cluster. The application is very simple: it reads from Kafka, it makes some transformations the DStream/RDDs and then outputs them to a Hive table. I have also tried to put…

scala apache-spark apache-spark-sql spark-streaming cloudera-cdh

asked Jan 05 '17 at 10:04

mgaido

2,987
3
17
39

2

votes

0 answers

How to convert RDD string(xml format) to dataframe in spark java?

Good solution available in below link if xml data available in file, https://github.com/databricks/spark-xml Below code convert xml to DataSet by loading physical file.. Dataset df = sqlContext.read().format("com.databricks.spark.xml") …

java apache-spark spark-streaming distributed-computing databricks

asked Jan 04 '17 at 06:00

Vimal Dhaduk

994
2
18
43

2

votes

1 answer

UTF-8 encoding error while connecting Flume twitter stream to spark in python

I am having a trouble while passing the Twitter data collected by the Flume agent to Spark Stream. I can download the twits independently while only using the Flume. But I am getting following error. I feel that it is the issue about the default…

apache-spark pyspark spark-streaming flume-ng flume-twitter

asked Jan 01 '17 at 23:59

smm

838
1
9
31

2

votes

2 answers

What can cause my Spark Streaming checkpoint to be incomplete?

I am playing around with the Spark Streaming API, and specifically testing the checkpointing feature. However, I am finding in certain circumstances the checkpoint being returned is not complete. The following code is being run in local[2] mode…

java apache-spark spark-streaming

asked Dec 29 '16 at 17:15

Joe C

15,324
8
38
50

2

votes

0 answers

Spark Java Collection Accumulator Object adding Issue

I am new to Spark. I am using Spark CollectionAccumulator to add list of customerobjects. For same customer I can have more than one object and all needs to get added in accumulator. What is happening is If i have 3 objects with same customer all…

java apache-spark spark-streaming

asked Dec 29 '16 at 06:49

MKS

129
1
12

2

votes

2 answers

Spark streaming with Yarn: executors not fully utilized

I am running spark streaming with Yarn with - spark-submit --master yarn --deploy-mode cluster --num-executors 2 --executor-memory 8g --driver-memory 2g --executor-cores 8 .. I am consuming Kafka through DireactStream approach (No receiver). I have…

apache-spark spark-streaming hadoop-yarn apache-httpclient-4.x executor

asked Dec 27 '16 at 12:50

Nishant Kumar

2,199
2
22
43

2

votes

0 answers

Most efficient way to write spark streaming data into RDBMS

I am writing a spark streaming job that consumes data from Kafka & writes to RDBMS. I am currently stuck because I do not know which would be the most efficient way to store this streaming data into RDBMS. On searching, I found a few methods -…

java apache-spark spark-streaming apache-spark-sql

asked Dec 21 '16 at 06:28

ronojoy ghosh

121
10

2

votes

1 answer

How can I catch the log output of pyspark foreachPartition?

pyspark when I use print() in foreachRdd method, it work! def echo(data): print data .... lines = MQTTUtils.createStream(ssc, brokerUrl, topics) topic_rdd = lines.map(lambda x: get_topic_rdd(x)).filter(lambda x: x[0]!=…

python pyspark spark-streaming

asked Dec 20 '16 at 06:59

wu alex

21
2

2

votes

1 answer

inappropriate output while creating a dataframe

I'm trying to stream the data from kafka topic using scala application.I'm able to get the data from the topic, but how to create a data frame out of it? Here is the data(in string,string format) { "action": "AppEvent", "tenantid": 298, …

dataframe apache-kafka spark-streaming

asked Dec 19 '16 at 12:27

jack AKA karthik

885
3
15
30

2

votes

1 answer

Spark Streaming Kafka direct consumer consumption speed drop

Kafka direct consumer started to limit reads to 450 events(5 * 90 partitions) per batch (5 seconds), it was running fine for 1 or 2 days before that (about 5000 to 40000 events per batch) I'm using spark standalone cluster (spark and…

scala amazon-web-services apache-spark apache-kafka spark-streaming

asked Dec 19 '16 at 12:07

stanislav.chetvertkov

1,620
3
13
24

2

votes

1 answer

Spark streaming joins with multiple history tables

Spark version: 1.5.2 We are trying to implement streaming for the first time and trying to do the CDC on incoming streams and store results in hdfs. What is working We started the POC with 1 table CDC with input file streams. The base (history)…

spark-streaming apache-spark-sql

asked Dec 14 '16 at 17:13

K. Sam

21
2

2

votes

1 answer

Is there a bug about using RDD.cartesian with Spark Streaming?

My code : ks1 = KafkaUtils.createStream(ssc, zkQuorum='localhost:2181', groupId='G1', topics={'test': 2}) ks2 = KafkaUtils.createStream(ssc, zkQuorum='localhost:2181', groupId='G2', topics={'test': 2}) d1 = ks1.map(lambda x: x[1]).flatMap(lambda x:…

apache-spark pyspark spark-streaming

asked Dec 13 '16 at 08:39

Zhang Tong

4,569
3
19
38

2

votes

2 answers

How to read concurrently from each Kafka partition in Spark Streaming DirectAPI

If I am correct, by default spark streaming 1.6.1 uses a single thread to read data from each Kafka partition, let assume my Kafka topic partition is 50 and that means messages in each 50 partitions will be read sequentially or may in round robin…

apache-spark apache-kafka spark-streaming kafka-consumer-api kafka-producer-api

asked Dec 12 '16 at 16:12

nilesh1212

1,561
2
26
60

2

votes

1 answer

How to extract each JSONobject from JSONArray and save to cassandra in spark streaming

I'm trying to get kafka streaming data which is JSONArray in spark streaming, each JSONArray contain several JSONObject. I want to save each JSONObject into datadrames, and save to cassandra table after mapping with the other table. I've tried to…

json scala apache-spark cassandra spark-streaming

asked Dec 12 '16 at 05:43

gogocatmario

70
10

Questions tagged [spark-streaming]