More information can be found in the official documentation.
Questions tagged [spark-shell]
135 questions
1
vote
0 answers
Issue running spark-shell with yarn client, ERROR client.TransportClient: Failed to send RPC
I am trying to setup hadoop 3.1.2 with spark in windows. i have started hdfs cluster and i am able to create,copy files in hdfs. When i try to start spark-shell with yarn i am facing
ERROR cluster.YarnClientSchedulerBackend: Diagnostics message:…

andrew
- 19
- 7
1
vote
0 answers
Using `spark-submit` to start a job in a single node standalone spark cluster
I have a single node spark cluster (4 cpu cores and 15GB of memory) configured with a single worker. I can access the web UI and see the worker node. However, I am having trouble submitting the jobs using spark-submit. I have couple of questions.
I…

rasthiya
- 650
- 1
- 6
- 20
1
vote
1 answer
pivot dataframe into fixed no of columns spark sql
I have a dataframe
val df = spark.sqlContext.createDataFrame(Seq( ("100","3","sceince","A"), ("100","3","maths","B"), ("100","3","maths","F"), ("100","3","computesrs","null"), ("101","2","maths","E"), ("101","2","computesrs","C"),…

Rao
- 153
- 12
1
vote
1 answer
Is there a way to get SparkSession Id of the current Spark Session?
I have a Spark Session create by spark-shell and another spark session created by my code.(imported via jar passed to the spark-shell)
Is there a way to compare the session id of the two Spark Sessions?
I know we can get applicationId via…

blueberret
- 21
- 1
- 5
1
vote
0 answers
Spark on Yarn error : Yarn application has already ended! It might have been killed or unable to launch application master
While starting spark-shell --master yarn --deploy-mode client I am getting error :
Yarn application has already ended! It might have been killed or
unable to launch application master.
Here is the complete log from Yarn:
19/08/28 00:54:55 INFO…

Sabyasachi Mitra
- 365
- 1
- 4
- 12
1
vote
0 answers
Spark on yarn, Connection reset by peer
Searched a lot but all in vain, this is a 3 node EC2 cluster in AWS, checked the disk space, resources, running services, all seems to be fine but i get this error. Please help to resolve this.
10.0.1.5 & 10.0.1.6 are datanodes, i just ran the…

onlyvinish
- 435
- 1
- 5
- 20
1
vote
2 answers
Failed to execute Cassandra CQL statement, while reading from Ignite Cache
I am trying to integrate ignite with cassandra. I set up the configuration and started the ignite node. But I can not insert/read data from Ignite cache/cassandra db. I created Keyspace and table in the cassandra. And inserted some values. But when…

Ashok v
- 77
- 1
- 8
1
vote
1 answer
Why do Spark shells (PySpark or Scala) run in client mode instead of cluster mode?
I've always understood the Spark shells, be it PySpark or Scala, run in the client mode. And correct me if I'm wrong, there isn't an out-of-the-box configuration to use them in cluster mode.
Why is this the case? What makes cluster mode unsuitable…

flow2k
- 3,999
- 40
- 55
1
vote
1 answer
Difference between sparksession text and textfile methods?
I am working with Spark scala shell and trying to create dataframe and datasets from a text file.
For getting datasets from a text file, there are two options, text and textFile methods as follows:
scala> spark.read.
csv format jdbc json …

KayV
- 12,987
- 11
- 98
- 148
1
vote
4 answers
Scala/Spark determine the path of external table
I am having one external table on on gs bucket and to do some compaction logic, I want to determine the full path on which the table is created.
val tableName="stock_ticks_cow_part"
val primaryKey="key"
val versionPartition="version"
val…

shiv
- 1,940
- 1
- 15
- 22
1
vote
1 answer
spark read contents of zip file in HDFS
I Am trying to read data from zip file
can read whole text file as below
val f = sc.wholeTextFiles("hdfs://")
but don`t know, how to read text data inside zip file
Is there any possible way to do it, if yes please let me know.

sande
- 567
- 1
- 10
- 24
0
votes
0 answers
How can I connect to remote hive metastore which is kerberized using spark-shell from Google Cloud Platform ssh terminal?
I wanna read data from remote hive server in spark-shell and write in bigquery table. I am facing issue in establishing connection to hive metastore. I have downloaded krb5.conf and keytab file and using them spark-shell still not able to make…
0
votes
1 answer
How to kill a spark shell via Spark's REST API?
I'm running Spark version 2.0.1 and want to kill a spark shell via the REST API (cannot use any other methods such as the yarn commands, for instance).
I managed to get the application id (with the spark-master:8080/json/ endpoint), but I could not…

omer
- 1,242
- 4
- 18
- 45
0
votes
0 answers
Error when using command Val rdd in Spark-shell
I am working on a lab assignment to follow an online tutorial, reference link, https://sparkbyexamples.com/
When I replicate the following attached screenshot below:
Tutorial Screenshot
I receive the following error message in Spark-shell, please…

Fabio
- 1
0
votes
0 answers
what is difference between running pyspark code directly with python and with spark-shell?
I have a code that user data from a postgres database and save it in a delta lake:
import pyspark
from delta import *
import time
start_time = time.time()
builder = (pyspark.sql.SparkSession.builder.appName("MyApp")
…

Tavakoli
- 1,303
- 3
- 18
- 36