More information can be found in the official documentation.
Questions tagged [spark-shell]
135 questions
0
votes
0 answers
Spark shell error while running num executors. YARN application has exited unexpectedly with state FAILED
I have just installed spark-3.3.1 and am trying to run the num executors but the job is getting failed.
I am doing it for the first time. I am unable to identify the cause of job failure here.
adminn@master:~$ spark-shell --master yarn…

Salva
- 81
- 1
- 9
0
votes
1 answer
spark-sql/spark-submit with delta lake is resulting null pointer exception (at org.apache.spark.storage.BlockManagerMasterEndpoint)
I'm using delta lake on using pyspark by submitting below command
spark-sql --packages io.delta:delta-core_2.12:0.8.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf…

Vinod R
- 17
- 5
0
votes
0 answers
Ctrl - left and right now working in Spark Shell in Windows 10
I installed Spark on Windows 10. Everything works fine except for the Ctrl - left and Ctrl - right keys which doesn't move a word but just a character. How do I fix this or find out what are the correct bindings to move a word in Spark Shell

Salil Surendran
- 2,245
- 4
- 27
- 42
0
votes
0 answers
Dynamically Pivot/Transpose Rows to Columns in Spark using 2 Columns
I have some Data which i want to PIVOT, and i am able to PIVOT it using a Single Column Scenario like below -
Dim FY2019 FY2020 FY2021
NA 20000 30000 25000
EUROPE 10000 15000 20000
ASIA 30000 10000 …

Samrat Saha
- 65
- 8
0
votes
1 answer
How to read new table with data in spark-shell SQL?
I am new to spark shell and I am trying to add new table and read it.
I have added this file:
workers.txt:
1201, satish, 25
1202, krishna, 28
1203, amith, 39
1204, javed, 23
1205, prudvi, 23
and run the commands:
spark-shell
val sqlContext = new…

Atheel Massalha
- 424
- 1
- 6
- 18
0
votes
0 answers
Execute a file in Google Cloud Platform (GCP) bucket using Scala
I'm looking to execute Scala code in text file using GCP with Spark Shell.
Using GCP (Google Cloud Platform), I've done the following:
Created a DataProc instance and named it gcp-cluster-091122.
Created a Cloud Bucket and named it…

Rashad Nelson
- 27
- 1
- 5
0
votes
0 answers
how to deploy pyspark/spark image into k8s as running pods to access spark-shell?
I have created pyspark image:
spark 3.3.0
hadoop 3.3.4
I want to deploy it into k8s so I can customize the number of executors, their memory/cpu but I do not want to deploy a spark job. I want to deploy these pods as a purely pyspark image so I can…

Dariusz Krynicki
- 2,544
- 1
- 22
- 47
0
votes
1 answer
Why spark-shell throws NoSuchMethodException while calling newInstance via reflection
spark-shell throws NoSuchMethodException if I define a class in REPL and then call newInstance via reflection.
Spark context available as 'sc' (master = yarn, app id = application_1656488084960_0162).
Spark session available as 'spark'.
Welcome to
…

核心力量
- 99
- 1
- 6
0
votes
0 answers
Caused by: java.io.IOException: Error accessing /app/platform/spark-3.2.1-bin-hadoop3.2/jars/._dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
enter code here hims-scheduler-2.3.9.jar:/app/platform/spark-3.2.1-bin-hadoop3.2/jars/hive-shims-scheduler-2.3.9.jar:/app/platform/hadoop-3.3.2/etc/hadoop]
Exception in thread "main" scala.reflect.internal.FatalError: Error accessing…
0
votes
1 answer
spark-shell not working on installing apache spark. Error: system cannot find the path specified
I installed Apache Spark, have java and python installed as well. Set up the environment variables as per this article: https://phoenixnap.com/kb/install-spark-on-windows-10
I have also installed winutils.exe.
Initially I was getting an error like…

preet
- 13
- 2
0
votes
1 answer
Is there a way to rerun a pasted block of code in spark shell?
I regularly copy blocks of code into spark-shell and run the block using
:paste
ctrl-d
Sometimes it errors because another line of code is required first e.g. an import. Once I have added any other requirements in, I would like to rerun the whole…
0
votes
1 answer
Spark-Shell Scala Dataset Display only a few columns in query
I am trying to display just a few columns in Scala like just name, address and zip
I have this so far...
scala> pe06DS.filter(pe06data => pe06data.state == "OH").show()
+--------------+--------+----------+-----+-----+
| address| city| …

Lee Roger
- 11
- 1
0
votes
2 answers
Load Text Files and store it in Dataframe using Pyspark
I am migrating pig script to pyspark and I am new to Pyspark so I am stuck at data loading.
My pig script looks like:
Bag1 = LOAD '/refined/em/em_results/202112/' USING PigStorage('\u1') AS
(PAYER_SHORT: chararray
,SUPER_PAYER_SHORT:…

Neel Sharma
- 53
- 1
- 4
0
votes
1 answer
Dynamically Pivot/Transpose Rows to Columns in Hive/Spark
I have Quaterly basis Data and Data keeps Growing dynamically as Quater Grows-
qtr dimvalue percentage
FY2019-Q1 XYZ 15
FY2019-Q1 ABC 80
FY2019-Q1 PPP 5
FY2019-Q2 XYZ 10
FY2019-Q2 ABC …

Samrat Saha
- 65
- 8
0
votes
2 answers
Same Spark Dataframe created in 2 different ways gets different execution times in same query
I created the same Spark Dataframe in 2 ways in order to run Spark SQL on it.
1. I read the data from a .csv file straight into a Dataframe in Spark shell using the following command:
val…

Antonis Pervanas
- 11
- 4