I am trying to install library spark-xml_2.12-0.15.0
using dbx
.
The documentation I found is to include it on the conf/deployment.yml
file like:
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "10.4.x-cpu-ml-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 2
build:
commands:
- "mvn clean package" #
environments:
default:
workflows:
- name: "charming-aurora-sample-jvm"
libraries:
- jar: "{{ 'file://' + dbx.get_last_modified_file('target/scala-2.12', 'jar') }}" #
tasks:
- task_key: "main"
<<: *basic-static-cluster
deployment_config: #
no_package: true
spark_jar_task:
main_class_name: "org.some.main.ClassName"
You may see documentation page here: https://dbx.readthedocs.io/en/latest/guides/jvm/jvm_devops/?h=maven
I have installed the library on the cluster using Maven file (https://mvnrepository.com/artifact/com.databricks/spark-xml_2.13/0.15.0):
<!-- https://mvnrepository.com/artifact/com.databricks/spark-xml -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.13</artifactId>
<version>0.15.0</version>
</dependency>
I can use it on a notebook level but not from a job deployed using dbx.
Edit
I am using PySpark .
So, I included it as this at conf/deployment.yml
:
libraries:
- maven: "com.databricks:spark-xml_2.12:0.15.0"
On the file conf/deployment.yml
- name: "my-job"
libraries:
- maven:
- coordinates:"com.databricks:spark-xml_2.12:0.15.0"
tasks:
- task_key: "first_task"
<<: *basic-static-cluster
python_wheel_task:
package_name: "project_name"
entry_point: "jl" # take a look at the setup.py entry_points section for details on how to define an entrypoint
parameters: ["--conf-file", "file:fuse://conf/tasks/my_job_config.yml"]
Then I go with
dbx deploy my-job
This throwing the following error:
HTTPError: 400 Client Error: Bad Request for url: https://adb-xxxx.azuredatabricks.net/api/2.0/jobs/reset
Response from server:
{ 'error_code': 'MALFORMED_REQUEST',
'message': "Could not parse request object: Expected 'START_OBJECT' not "
"'START_ARRAY'\n"
' at [Source: (ByteArrayInputStream); line: 1, column: 91]\n'
' at [Source: java.io.ByteArrayInputStream@37fda06f; line: 1, '
'column: 91]'}