0

I am trying to rewrite spark-submit which has arguments like packages, repositories, jars, files, arguments defined by users like this into Livy REST JSON Protocol. please find more details below.

spark-submit command:

spark-submit \
  --packages com.hortonworks.shc:shc-core:1.1.0.3.1.6.5-3 \
  --repositories http://repo.hortonworks.com/content/groups/public/ \
  --jars /usr/hdp/current/phoenix-client/phoenix-server.jar \
  --files x/y.yml,x/y1.yml $HOME/spark_apps/a/app.py \
  --arg_name value \
  --arg_name2 value 

what I tried in Livy :

{
    "conf": {"com.hortonworks.shc": "shc-core:1.1.0.3.1.6.5-3"},
    "jars":["wasbs:///phoenix-server.jar"],
    "file": "/home/admin/spark_apps/a/app.py",
    "files": ["/home/admin/x/y.yml,/home/admin/x/y1.yml"], 
    "args": [
         "--arg_name=value", 
         "--arg_name=value"] 
        
}

And the error is :

ls: cannot access '/usr/hdp/current/hadoop/lib': No such file or directory
log4j:ERROR Could not find value for key log4j.appender.tcp
log4j:ERROR Could not instantiate appender named "tcp".
Warning: Ignoring non-spark config property: com.hortonworks.shc=shc-core:1.1.0.3.1.6.5-3
Warning: Master yarn-cluster is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
21/03/14 12:05:07 WARN NativeCodeLoader [main]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/14 12:05:08 WARN DependencyUtils [main]: Skip remote jar wasbs:///phoenix-server.jar.
21/03/14 12:05:09 INFO RequestHedgingRMFailoverProxyProvider [main]: Created wrapped proxy for [rm1, rm2]
21/03/14 12:05:09 INFO RequestHedgingRMFailoverProxyProvider [main]: Looking for the active RM in [rm1, rm2]...
21/03/14 12:05:09 INFO RequestHedgingRMFailoverProxyProvider [main]: Found active RM [rm2]
21/03/14 12:05:09 INFO Client [main]: Requesting a new application from cluster with 2 NodeManagers
21/03/14 12:05:09 INFO Configuration [main]: found resource resource-types.xml at file:/etc/hadoop/4.1.2.5/0/resource-types.xml
21/03/14 12:05:09 INFO Client [main]: Verifying our application has not requested more than the maximum memory capability of the cluster (51200 MB per container)
21/03/14 12:05:09 INFO Client [main]: Will allocate AM container, with 1408 MB memory including 384 MB overhead
21/03/14 12:05:09 INFO Client [main]: Setting up container launch context for our AM
21/03/14 12:05:10 INFO Client [main]: Setting up the launch environment for our AM container
21/03/14 12:05:10 INFO Client [main]: Preparing resources for our AM container
21/03/14 12:05:10 INFO Client [main]: Falling back to uploading libraries in this host
21/03/14 12:05:10 INFO Client [main]: Uploading resource file:/tmp/spark-c923cab1-6cf5-4fa8-9db3-73d156052819/__hive_libs__5946923461629036475.zip -> wasbs://container-spark-2021-01-12t10-28-51-042z@container.blob.core.windows.net/user/livy/.sparkStaging/application_1615371594106_0446/__hive_libs__5946923461629036475.zip
21/03/14 12:05:12 INFO Client [main]: Source and destination file systems are the same. Not copying wasbs:/phoenix-server.jar
21/03/14 12:05:12 WARN AzureFileSystemThreadPoolExecutor [main]: Disabling threads for Delete operation as thread count 0 is <= 1
21/03/14 12:05:12 INFO AzureFileSystemThreadPoolExecutor [main]: Time taken for Delete operation is: 11 ms with threads: 0
21/03/14 12:05:12 INFO Client [main]: Deleted staging directory wasbs://container-2021-01-12t10-28-51-042z@container.blob.core.windows.net/user/livy/.sparkStaging/application_1615371594106_0446
Exception in thread "main" java.io.FileNotFoundException: wasbs://container-2021-01-12t10-28-51-042z@container.blob.core.windows.net/phoenix-server.jar: No such file or directory.
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatusInternal(NativeAzureFileSystem.java:2716)
    at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2620)
    at org.apache.spark.deploy.yarn.ClientDistributedCacheManager$$anonfun$1.apply(ClientDistributedCacheManager.scala:71)
    at org.apache.spark.deploy.yarn.ClientDistributedCacheManager$$anonfun$1.apply(ClientDistributedCacheManager.scala:71)
    at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
    at scala.collection.AbstractMap.getOrElse(Map.scala:59)
    at org.apache.spark.deploy.yarn.ClientDistributedCacheManager.addResource(ClientDistributedCacheManager.scala:71)
    at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:479)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$16$$anonfun$apply$6.apply(Client.scala:651)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$16$$anonfun$apply$6.apply(Client.scala:650)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$16.apply(Client.scala:650)
    at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$16.apply(Client.scala:649)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:649)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:917)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1239)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:858)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:942)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/03/14 12:05:12 INFO ShutdownHookManager [shutdown-hook-0]: Shutdown hook called
21/03/14 12:05:12 INFO ShutdownHookManager [shutdown-hook-0]: Deleting directory /tmp/spark-c923cab1-6cf5-4fa8-9db3-73d156052819
21/03/14 12:05:12 INFO ShutdownHookManager [shutdown-hook-0]: Deleting directory /tmp/spark-424d5ba4-718c-470a-b110-9020578aef12

Could you please help- me to rewrite spark-submit into livy rest json please..?

thank you in advance.

  • 21/03/15 08:54:58 INFO Client [main]: Uploading resource file:/tmp/spark-ae865c35-675b-4307-92af-c0130db89823/__hive_libs__1736127811292965288.zip -> wasbs://container-spark-2021-01-12t10-28-51-042z@container.blob.core.windows.net/user/livy/.sparkStaging/application_1615371594106_0560/__hive_libs__1736127811292965288.zip 21/03/15 08:55:00 INFO Client [main]: Source and destination file systems are the same. Not copying wasbs:/phoenix-server.jar – Harish Kowrada Mar 15 '21 at 09:53

1 Answers1

0

Maybe you should check your JVM. To check the JVM config.

And the log shows that ls: cannot access '/usr/hdp/current/hadoop/lib': No such file or directory. Also to check the path.

Maybe check the Spark User Guide . To make Spark runtime jars accessible from YARN side, you can specify spark.yarn.archive or spark.yarn.jars. For details please refer toSpark Properties. If neither spark.yarn.archive nor spark.yarn.jars is specified, Spark will create a zip file with all jars under $SPARK_HOME/jarsand upload it to the distributed cache.

hammer
  • 25
  • 4
  • I am using Pyspark and the access issue is resolved. but still the error present like : 21/03/14 16:11:53 INFO Client [main]: Uploading resource file:/tmp/spark-4a266828-f414-45c2-ae54-3e5144eadfcf/__hive_libs__3077431337562461774.zip -> wasbs://container-spark-2021-01-12t10-28-51-042z@container.blob.core.windows.net/user/livy/.sparkStaging/application_1615371594106_0473/__hive_libs__3077431337562461774.zip 21/03/14 16:11:54 INFO Client [main]: Source and destination file systems are the same. Not copying wasbs://container-spark-202-042z@container.blob.core.windows.net/y.yml – Harish Kowrada Mar 14 '21 at 16:19
  • Could you write a complete path like `hdfs://****`,this question resolved. – hammer Mar 15 '21 at 00:51
  • I put everything in WASB storage : " wasbs:///phoenix-server.jar" – Harish Kowrada Mar 15 '21 at 09:52
  • 21/03/15 08:54:58 INFO Client [main]: Uploading resource file:/tmp/spark-ae865c35-675b-4307-92af-c0130db89823/__hive_libs__1736127811292965288.zip -> wasbs://container-spark-2021-01-12t10-28-51-042z@container.blob.core.windows.net/user/livy/.sparkStaging/application_1615371594106_0560/__hive_libs__1736127811292965288.zip 21/03/15 08:55:00 INFO Client [main]: Source and destination file systems are the same. Not copying wasbs:/phoenix-server.jar – Harish Kowrada Mar 15 '21 at 09:54