Questions tagged [spark-shell]

More information can be found in the official documentation.

135 questions
0
votes
0 answers

Spark shell error while running num executors. YARN application has exited unexpectedly with state FAILED

I have just installed spark-3.3.1 and am trying to run the num executors but the job is getting failed. I am doing it for the first time. I am unable to identify the cause of job failure here. adminn@master:~$ spark-shell --master yarn…
Salva
  • 81
  • 1
  • 9
0
votes
1 answer

spark-sql/spark-submit with delta lake is resulting null pointer exception (at org.apache.spark.storage.BlockManagerMasterEndpoint)

I'm using delta lake on using pyspark by submitting below command spark-sql --packages io.delta:delta-core_2.12:0.8.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf…
0
votes
0 answers

Ctrl - left and right now working in Spark Shell in Windows 10

I installed Spark on Windows 10. Everything works fine except for the Ctrl - left and Ctrl - right keys which doesn't move a word but just a character. How do I fix this or find out what are the correct bindings to move a word in Spark Shell
Salil Surendran
  • 2,245
  • 4
  • 27
  • 42
0
votes
0 answers

Dynamically Pivot/Transpose Rows to Columns in Spark using 2 Columns

I have some Data which i want to PIVOT, and i am able to PIVOT it using a Single Column Scenario like below - Dim FY2019 FY2020 FY2021 NA 20000 30000 25000 EUROPE 10000 15000 20000 ASIA 30000 10000 …
0
votes
1 answer

How to read new table with data in spark-shell SQL?

I am new to spark shell and I am trying to add new table and read it. I have added this file: workers.txt: 1201, satish, 25 1202, krishna, 28 1203, amith, 39 1204, javed, 23 1205, prudvi, 23 and run the commands: spark-shell val sqlContext = new…
Atheel Massalha
  • 424
  • 1
  • 6
  • 18
0
votes
0 answers

Execute a file in Google Cloud Platform (GCP) bucket using Scala

I'm looking to execute Scala code in text file using GCP with Spark Shell. Using GCP (Google Cloud Platform), I've done the following: Created a DataProc instance and named it gcp-cluster-091122. Created a Cloud Bucket and named it…
0
votes
0 answers

how to deploy pyspark/spark image into k8s as running pods to access spark-shell?

I have created pyspark image: spark 3.3.0 hadoop 3.3.4 I want to deploy it into k8s so I can customize the number of executors, their memory/cpu but I do not want to deploy a spark job. I want to deploy these pods as a purely pyspark image so I can…
Dariusz Krynicki
  • 2,544
  • 1
  • 22
  • 47
0
votes
1 answer

Why spark-shell throws NoSuchMethodException while calling newInstance via reflection

spark-shell throws NoSuchMethodException if I define a class in REPL and then call newInstance via reflection. Spark context available as 'sc' (master = yarn, app id = application_1656488084960_0162). Spark session available as 'spark'. Welcome to …
0
votes
0 answers

Caused by: java.io.IOException: Error accessing /app/platform/spark-3.2.1-bin-hadoop3.2/jars/._dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar

enter code here hims-scheduler-2.3.9.jar:/app/platform/spark-3.2.1-bin-hadoop3.2/jars/hive-shims-scheduler-2.3.9.jar:/app/platform/hadoop-3.3.2/etc/hadoop] Exception in thread "main" scala.reflect.internal.FatalError: Error accessing…
0
votes
1 answer

spark-shell not working on installing apache spark. Error: system cannot find the path specified

I installed Apache Spark, have java and python installed as well. Set up the environment variables as per this article: https://phoenixnap.com/kb/install-spark-on-windows-10 I have also installed winutils.exe. Initially I was getting an error like…
preet
  • 13
  • 2
0
votes
1 answer

Is there a way to rerun a pasted block of code in spark shell?

I regularly copy blocks of code into spark-shell and run the block using :paste ctrl-d Sometimes it errors because another line of code is required first e.g. an import. Once I have added any other requirements in, I would like to rerun the whole…
0
votes
1 answer

Spark-Shell Scala Dataset Display only a few columns in query

I am trying to display just a few columns in Scala like just name, address and zip I have this so far... scala> pe06DS.filter(pe06data => pe06data.state == "OH").show() +--------------+--------+----------+-----+-----+ | address| city| …
Lee Roger
  • 11
  • 1
0
votes
2 answers

Load Text Files and store it in Dataframe using Pyspark

I am migrating pig script to pyspark and I am new to Pyspark so I am stuck at data loading. My pig script looks like: Bag1 = LOAD '/refined/em/em_results/202112/' USING PigStorage('\u1') AS (PAYER_SHORT: chararray ,SUPER_PAYER_SHORT:…
Neel Sharma
  • 53
  • 1
  • 4
0
votes
1 answer

Dynamically Pivot/Transpose Rows to Columns in Hive/Spark

I have Quaterly basis Data and Data keeps Growing dynamically as Quater Grows- qtr dimvalue percentage FY2019-Q1 XYZ 15 FY2019-Q1 ABC 80 FY2019-Q1 PPP 5 FY2019-Q2 XYZ 10 FY2019-Q2 ABC …
0
votes
2 answers

Same Spark Dataframe created in 2 different ways gets different execution times in same query

I created the same Spark Dataframe in 2 ways in order to run Spark SQL on it. 1. I read the data from a .csv file straight into a Dataframe in Spark shell using the following command: val…
1 2 3
8 9