4

Is it possible to submit/configure a spark python script (.py) file to databricks job?

I have my developments happening in my Pycharm IDE, then push/commit the code to our gitlab repository. My requirement is I need to create new jobs in databricks cluster as and when a python script is moved to a GitLab master branch.

I would like to get some suggestions if its possible to create a databricks job on a python script, using gitlab.yml scripts?

In databricks Job UI, I could see spark jar or a notebook that can be used, but wondering if we can provide a python file.

Thanks,

Yuva

Yuva
  • 2,831
  • 7
  • 36
  • 60

1 Answers1

4

This functionality is not currently available in the Databricks UI, but it is accessible via the REST API. You'll want to use the SparkPythonTask data structure.

You'll find this example in the official documentation:

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/jobs/create <<JSON
{
  "name": "SparkPi Python job",
  "new_cluster": {
    "spark_version": "5.2.x-scala2.11",
    "node_type_id": "i3.xlarge",
    "num_workers": 2
  },
  "spark_python_task": {
    "python_file": "dbfs:/docs/pi.py",
    "parameters": [
      "10"
    ]
  }
}JSON

If you need help getting started with the REST API, see here.

Raphael K
  • 2,265
  • 1
  • 16
  • 23