1

Is there a method to list all notebooks, jobs in one workspace in databricks and load those into a managed table within DBFS?

I found a function code in below link

https://kb.databricks.com/python/list-all-workspace-objects.html

However, this does not give list of jobs. Also mainly need to store the result set into a dataframe , so that we can store the dataframe in a table .

Shruti
  • 105
  • 1
  • 10
  • I think you should go look here to extract job list: https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsList – Arthur Clerc-Gherardi Nov 09 '21 at 11:27
  • Hi Arthur, Thanks for replying. Actually, I am trying to use the data in a PowerBI report. Hence, thought of developing the code to store the data into a managed table which can be connected with PowerBI. Also mainly need to store the result set into a dataframe , so that we can store the dataframe in a table . – Shruti Nov 09 '21 at 13:51
  • You can do what you are saying in a notebook. First retrieve jobs through the API, same for objects, then create dataframes from these calls. Then you juste have to write 2 managed table. – Arthur Clerc-Gherardi Nov 09 '21 at 14:45
  • Where are you stuck ? – Arthur Clerc-Gherardi Nov 09 '21 at 14:45
  • Hi Arthur, I am unable to create a dataframe from the function call , I am new to PySpark and DataBricks, code taken from below link https://kb.databricks.com/python/list-all-workspace-objects.html – Shruti Nov 09 '21 at 16:43
  • Any help is appreciated, code is taken from https://kb.databricks.com/python/list-all-workspace-objects.html , need to return all results in appended form from the functions and when calling the function, it should get stored into a dataframe. – Shruti Nov 10 '21 at 12:59

2 Answers2

1

You can use job rest api link. You can use below python code for getting all jobs objects within workspace and phrase what information you need from that response. Note: Tested code !!

import requests
import json
class BearerAuth(requests.auth.AuthBase):
    def __init__(self, token):
        self.token = token
    def __call__(self, r):
        r.headers["authorization"] = "Bearer " + self.token
        return r
response = requests.get('https://databricksinstance/api/2.0/jobs/list', auth=BearerAuth('token')).json()
print(response)

enter image description here

  • Hi KarthiKeyan, Yes, I am able to run this code and getting details, how to store the result in dataframe, I am new to PySpark and DataBricks – Shruti Nov 09 '21 at 16:45
  • we can get the required details by phrasing and write as list in python and convert the python list into dataframe . You ref : https://stackoverflow.com/questions/48448473/pyspark-convert-a-standard-list-to-data-frame – Karthikeyan Rasipalay Durairaj Nov 09 '21 at 16:51
  • schema = StructType([ \ StructField("object_type",StringType(),True), \ StructField("path",StringType(),True), \ StructField("language",StringType(),True), \ StructField("object_id", StringType(), True), \ ]) df = spark.createDataFrame(data=data1,schema=schema) df = spark.createDataFrame(data=data1,schema=schema) Any clue? – Shruti Nov 10 '21 at 10:43
0

You could add the snippet below to automatically parse the json response and put it into a Pandas dataframe:

import pandas as pd

pd.json_normalize(response["jobs"])