0

I have written the code in intellij and created a jar file. Then i have used mobaxterm to upload the jar file onto apache ambari files under /user/maria_dev. Finally i have written spark submit command on mobaxterm as "spark-submit --class csv_parquet /user/maria_dev/first_2.12-0.1.jar yarn /user/maria_dev/fileupload/business-operations-survey-2020-covid-19-csv.csv /user/maria_dev/output" and im getting the following error

SPARK_MAJOR_VERSION is set to 2, using Spark2
Warning: Local jar /user/maria_dev/first_2.12-0.1.jar does not exist, skipping.
java.lang.ClassNotFoundException: csv_parquet
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
    at 

org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:861) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 21/06/28 19:51:42 INFO ShutdownHookManager: Shutdown hook called 21/06/28 19:51:42 INFO ShutdownHookManager: Deleting directory /tmp/spark-02f3e56f-af15-46ab- b43a-54d17d88f5cc

and my intellij code is:

 import org.apache.spark.sql.SparkSession
 object csv_parquet {
   def main(args: Array[String]): Unit = {
   val spark = SparkSession.builder().appName("csv_parquet").master("yarn").getOrCreate()
   val df = spark.read.format("csv")
         .option("delimiter",",")
         .option("header", "true")
         .option("path", "C:\\Users\\kiran\\business-operations-survey-2020-covid-19- 
csv.csv").load()
   df.show(20)
   df.write.mode("overwrite")
  .parquet("\\user\\maria_dev\\output")
   val par= spark.read.format("parquet").load("\\user\\maria_dev\\output")
   par.show()
   spark.close()

} }

I'm getting the output and was able to spark-submit locally but unable to do it on apache ambari.

Jvs Kiran
  • 11
  • 2
  • _Local jar /user/maria_dev/first_2.12-0.1.jar does not exist_ did you actually upload this file? – Gaël J Jun 28 '21 at 20:25
  • yes i have used "hdfs dfs -put /home/maria_dev/files/first_2.12-0.1.jar /user/maria_dev" and is imported to the location in ambari from my local terminal. And then Im using "spark-submit --class csv_parquet /user/maria_dev/first_2.12-0.1.jar yarn /user/maria_dev/fileupload/business-operations-survey-2020-covid-19-csv.csv /user/maria_dev/output" where the input file is in the ambari server and the output location is pointed towards an empty directory created in ambari. @Gaël J – Jvs Kiran Jun 29 '21 at 18:06
  • Now I'm getting this error when i tried to execute **""spark-submit --class csv_parquet hdfs://user/maria_dev/first_2.12-0.1.jar yarn hdfs://user/maria_dev/fileupload/business-operations-survey-2020-covid-19-csv.csv hdfs://user/maria_dev/output""** on my local terminal(MobaXterm) and I got this Error: **""WARN FileStreamSink: Error while looking for metadata directory. Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:%5Cuser%5Cmaria_dev%5Cfileupload%5Cbusiness-operations-survey-2020-covid-19-csv.csv""** – Jvs Kiran Jun 29 '21 at 22:49

0 Answers0