Use for questions specific to Apache Spark 2.0. For general questions related to Apache Spark use the tag [apache-spark].
Questions tagged [apache-spark-2.0]
464 questions
0
votes
1 answer
Trouble Submitting Apache Spark Application to Containerized Cluster
I am having trouble running a Spark application using both spark-submit and the internal REST API. The deployment scenario I would like to demonstrate is Spark running as a cluster on my local laptop.
To that end, I've created two Docker containers…

Michael Reynolds
- 96
- 1
- 5
0
votes
3 answers
Spark standalone cluster tuning
We have spark 2.1.0 standalone cluster running on a single node with 8 cores and 50GB memory(single worker).
We run spark applications in cluster mode with the following memory settings -
--driver-memory = 7GB (default - 1core is…

veerat
- 105
- 9
0
votes
1 answer
Apache Spark connector to read from Azure Queue service?
This can be more of a configuration question, but could not find a specific answer to the problem I am trying to solve.
I am looking for a connector to read from Azure Storage Queue Service through Spark, though there are connectors available for…

jdk2588
- 782
- 1
- 9
- 23
0
votes
1 answer
Spark2 mongodb connector polymorphic schema
I have collection col that contains
{
'_id': ObjectId(...)
'type': "a"
'f1': data1
}
on same collection i have
{
'_id': ObjectId(...)
'f2': 222.234
'type': "b"
}
Spark MongoDB connector Is not working fine. It's reorder the…

Yehuda
- 457
- 2
- 6
- 16
0
votes
1 answer
Ignite : system.out.print commands in code not logging into the log file
I started two ignite server nodes with the following on console /root/apache-ignite-fabric-2.3.0-bin/bin/ignite.sh -v
From a remote client, I run the ClusterGroup example program. I see the below type of logs (printed from system.out.print) in both…

Mahesh Renduchintala
- 672
- 7
- 18
0
votes
1 answer
Apache Spark -- Data Grouping and Execution in worker nodes
We are getting live machine data as json and we get this data from RabbitMQ. below is a sample of the json,
{"DeviceId":"MAC-1001","DeviceType":"Sim-1","TimeStamp":"05-12-2017…

Ramesh Kumar R
- 65
- 7
0
votes
3 answers
Spark version mismatch using maven dependencies
I want ro run simple worcount ecample using apache Spark. Using local jar files in $SPARK_HOME/jars it runs correctly, but using maven dependancies it errors:
java.lang.NoSuchMethodError:…

Soheil Pourbafrani
- 3,249
- 3
- 32
- 69
0
votes
1 answer
Why is Spark2 running on only one node?
I am running Spark2 from Zeppelin (0.7 in HDP 2.6) and I am doing an idf transformation which crashes after many hours. It is run on a cluster with a master and 3 datanodes: s1, s2 and s3. All nodes have a Spark2 client and each has 8 cores and 16GB…

schoon
- 2,858
- 3
- 46
- 78
0
votes
1 answer
Set current project in sbt - spark build issue
I am getting the error Set current project to spark-parent (in build file:/C:/cygwin64/spark-current/spark-2.1.1/) while trying to build spark. Is there an option "-Dcurrent" or some sbt switch that I can set to facilitate this or do I need to…

uh_big_mike_boi
- 3,350
- 4
- 33
- 64
0
votes
0 answers
Spark Standalone cluster only two workers utilized
In Spark Standalone Cluster, only 2 of the 6 worker instances get utilized, rest of them are idle. I used two VMs both having 4 cores. 2 workers were on the local VM(where master was started) and 4 workers were on the other VM. Only local two got…

Ashwin Daswani
- 1
- 1
0
votes
1 answer
Spark 2.1 register UDF to functionRegistry
Hi I want to register a UDF object that is already created. I'm using spark 2.1, and the sparkSession.udf.register() function does not accept a UDF parameter only a regular scala function. It's easy to miss something from the large Spark API so…

uh_big_mike_boi
- 3,350
- 4
- 33
- 64
0
votes
1 answer
Can I set a general-purpose (not spark.*) parameter when submitting a spark application?
A normal way to set a parameter in spark-submit is using --conf:
spark2-shell --conf "spark.nonexisting=true" --conf "failOnDataLoss=false"
Unfortunately this only works for spark.* parameters and I need to set up other parameters which are simply…

Viacheslav Rodionov
- 2,335
- 21
- 22
0
votes
1 answer
Running Dependent Queries with SparkSQL using Spark Session
We have 3 queries which are currently running on HIVE.
Using Spark 2.1.0
We are trying to Run that using Spark SQL but by using the SparkSession(like wrapping with Scala code making a Jar & then Submit using Spark-Submit)
Now for Example lets say…

AJm
- 993
- 2
- 20
- 39
0
votes
2 answers
Spark Streaming design questions
I don't have any specific query but design question. I am new to spark/streaming hence forgive me if I am asking dumb question. Please delete it if question is inappropriate for this forum.
So basically we have requirement where we have to…

Rishi Saraf
- 1,644
- 2
- 14
- 27
0
votes
1 answer
spark history not start on ambari cluster
we start the spark history as the following
/usr/hdp/2.6.0.3-8/spark2/sbin/start-history-server.sh
from the log
spark-root-org.apache.spark.deploy.history.HistoryServer-1-master01
we get
WARN AbstractLifeCycle: FAILED…

King David
- 500
- 1
- 7
- 20