0

I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in Cloudera community

https://community.cloudera.com/t5/Community-Articles/How-to-Submit-Spark-Application-through-Livy-REST-API/ta-p/247502

I created a test1.json as follows

{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
"file": ["test1.py"]
}

and call InvokeHTTP. But I get this error ""Cannot deserialize instance of java.lang.String out of START_ARRAY token\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 224] (through reference chain: org.apache.livy.server.batch.CreateBatchRequest[\"file\"

I think the 'file' field with test1.py is wrong. Can anyone tell me how to submit this? This works with a simple spark-submit test1.py

All suggestions are welcome

Tinniam V. Ganesh
  • 1,979
  • 6
  • 26
  • 51
  • At first glance, this post seems a good start for those who like Python _(I'm not one of these)_ https://www.statworx.com/ch/blog/access-your-spark-cluster-from-everywhere-with-apache-livy/ >> also deals with _curl_ for submitting a "batch" – Samson Scharfrichter Jan 11 '20 at 15:13
  • @SamsonScharfrichter corrected link – Tinniam V. Ganesh Jan 12 '20 at 03:43
  • @SamsonScharfrichter I have tried with curl also. Same error. I need to know exactly what fields have what parameters ? – Tinniam V. Ganesh Jan 12 '20 at 05:16
  • And the Apache Livy official documentation for the REST API (link above, found from Google) is rather explicit about which fileds are "maps" (Structs of nested key/value fields), which are "lists" (Arrays of Strings) and which are not (i.e. Strings). – Samson Scharfrichter Jan 12 '20 at 10:45

1 Answers1

-1

The following works For basic Hive access the following works use the below JSON

   {
      "file":"hdfs-path/test1.py"
   }

For Hive LLAP access use JSON as below

   {
    "jars": ["<path-to-jar>/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
    "pyFiles": ["<path-to-zip>/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
    "file": "<path-to-file>/test3.py"
    }

Interestingly when I put the zip in the "archives" field it gives error. It works for the "pyFiles" field though as shown above

Tinniam V. Ganesh
  • 1,979
  • 6
  • 26
  • 51