Questions tagged [spark-shell]

More information can be found in the official documentation.

135 questions
1
vote
0 answers

Issue running spark-shell with yarn client, ERROR client.TransportClient: Failed to send RPC

I am trying to setup hadoop 3.1.2 with spark in windows. i have started hdfs cluster and i am able to create,copy files in hdfs. When i try to start spark-shell with yarn i am facing ERROR cluster.YarnClientSchedulerBackend: Diagnostics message:…
andrew
  • 19
  • 7
1
vote
0 answers

Using `spark-submit` to start a job in a single node standalone spark cluster

I have a single node spark cluster (4 cpu cores and 15GB of memory) configured with a single worker. I can access the web UI and see the worker node. However, I am having trouble submitting the jobs using spark-submit. I have couple of questions. I…
1
vote
1 answer

pivot dataframe into fixed no of columns spark sql

I have a dataframe val df = spark.sqlContext.createDataFrame(Seq( ("100","3","sceince","A"), ("100","3","maths","B"), ("100","3","maths","F"), ("100","3","computesrs","null"), ("101","2","maths","E"), ("101","2","computesrs","C"),…
Rao
  • 153
  • 12
1
vote
1 answer

Is there a way to get SparkSession Id of the current Spark Session?

I have a Spark Session create by spark-shell and another spark session created by my code.(imported via jar passed to the spark-shell) Is there a way to compare the session id of the two Spark Sessions? I know we can get applicationId via…
blueberret
  • 21
  • 1
  • 5
1
vote
0 answers

Spark on Yarn error : Yarn application has already ended! It might have been killed or unable to launch application master

While starting spark-shell --master yarn --deploy-mode client I am getting error : Yarn application has already ended! It might have been killed or unable to launch application master. Here is the complete log from Yarn: 19/08/28 00:54:55 INFO…
Sabyasachi Mitra
  • 365
  • 1
  • 4
  • 12
1
vote
0 answers

Spark on yarn, Connection reset by peer

Searched a lot but all in vain, this is a 3 node EC2 cluster in AWS, checked the disk space, resources, running services, all seems to be fine but i get this error. Please help to resolve this. 10.0.1.5 & 10.0.1.6 are datanodes, i just ran the…
onlyvinish
  • 435
  • 1
  • 5
  • 20
1
vote
2 answers

Failed to execute Cassandra CQL statement, while reading from Ignite Cache

I am trying to integrate ignite with cassandra. I set up the configuration and started the ignite node. But I can not insert/read data from Ignite cache/cassandra db. I created Keyspace and table in the cassandra. And inserted some values. But when…
Ashok v
  • 77
  • 1
  • 8
1
vote
1 answer

Why do Spark shells (PySpark or Scala) run in client mode instead of cluster mode?

I've always understood the Spark shells, be it PySpark or Scala, run in the client mode. And correct me if I'm wrong, there isn't an out-of-the-box configuration to use them in cluster mode. Why is this the case? What makes cluster mode unsuitable…
flow2k
  • 3,999
  • 40
  • 55
1
vote
1 answer

Difference between sparksession text and textfile methods?

I am working with Spark scala shell and trying to create dataframe and datasets from a text file. For getting datasets from a text file, there are two options, text and textFile methods as follows: scala> spark.read. csv format jdbc json …
KayV
  • 12,987
  • 11
  • 98
  • 148
1
vote
4 answers

Scala/Spark determine the path of external table

I am having one external table on on gs bucket and to do some compaction logic, I want to determine the full path on which the table is created. val tableName="stock_ticks_cow_part" val primaryKey="key" val versionPartition="version" val…
shiv
  • 1,940
  • 1
  • 15
  • 22
1
vote
1 answer

spark read contents of zip file in HDFS

I Am trying to read data from zip file can read whole text file as below val f = sc.wholeTextFiles("hdfs://") but don`t know, how to read text data inside zip file Is there any possible way to do it, if yes please let me know.
sande
  • 567
  • 1
  • 10
  • 24
0
votes
0 answers

How can I connect to remote hive metastore which is kerberized using spark-shell from Google Cloud Platform ssh terminal?

I wanna read data from remote hive server in spark-shell and write in bigquery table. I am facing issue in establishing connection to hive metastore. I have downloaded krb5.conf and keytab file and using them spark-shell still not able to make…
0
votes
1 answer

How to kill a spark shell via Spark's REST API?

I'm running Spark version 2.0.1 and want to kill a spark shell via the REST API (cannot use any other methods such as the yarn commands, for instance). I managed to get the application id (with the spark-master:8080/json/ endpoint), but I could not…
omer
  • 1,242
  • 4
  • 18
  • 45
0
votes
0 answers

Error when using command Val rdd in Spark-shell

I am working on a lab assignment to follow an online tutorial, reference link, https://sparkbyexamples.com/ When I replicate the following attached screenshot below: Tutorial Screenshot I receive the following error message in Spark-shell, please…
Fabio
  • 1
0
votes
0 answers

what is difference between running pyspark code directly with python and with spark-shell?

I have a code that user data from a postgres database and save it in a delta lake: import pyspark from delta import * import time start_time = time.time() builder = (pyspark.sql.SparkSession.builder.appName("MyApp") …
Tavakoli
  • 1,303
  • 3
  • 18
  • 36
1 2 3
8 9