List all notebooks, jobs in databricks and load resultset into a dataframe and a managed table

Question

Is there a method to list all notebooks, jobs in one workspace in databricks and load those into a managed table within DBFS?

I found a function code in below link

https://kb.databricks.com/python/list-all-workspace-objects.html

However, this does not give list of jobs. Also mainly need to store the result set into a dataframe , so that we can store the dataframe in a table .

I think you should go look here to extract job list: https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsList — Arthur Clerc-Gherardi, Nov 09 '21 at 11:27
Hi Arthur, Thanks for replying. Actually, I am trying to use the data in a PowerBI report. Hence, thought of developing the code to store the data into a managed table which can be connected with PowerBI. Also mainly need to store the result set into a dataframe , so that we can store the dataframe in a table . — Shruti, Nov 09 '21 at 13:51
You can do what you are saying in a notebook. First retrieve jobs through the API, same for objects, then create dataframes from these calls. Then you juste have to write 2 managed table. — Arthur Clerc-Gherardi, Nov 09 '21 at 14:45
Hi Arthur, I am unable to create a dataframe from the function call , I am new to PySpark and DataBricks, code taken from below link https://kb.databricks.com/python/list-all-workspace-objects.html — Shruti, Nov 09 '21 at 16:43
Any help is appreciated, code is taken from https://kb.databricks.com/python/list-all-workspace-objects.html , need to return all results in appended form from the functions and when calling the function, it should get stored into a dataframe. — Shruti, Nov 10 '21 at 12:59

score 1 · Answer 1 · answered Nov 09 '21 at 16:14

1

You can use job rest api link. You can use below python code for getting all jobs objects within workspace and phrase what information you need from that response. Note: Tested code !!

import requests
import json
class BearerAuth(requests.auth.AuthBase):
    def __init__(self, token):
        self.token = token
    def __call__(self, r):
        r.headers["authorization"] = "Bearer " + self.token
        return r
response = requests.get('https://databricksinstance/api/2.0/jobs/list', auth=BearerAuth('token')).json()
print(response)

answered Nov 09 '21 at 16:14

Karthikeyan Rasipalay Durairaj

1,920
13
35

Hi KarthiKeyan, Yes, I am able to run this code and getting details, how to store the result in dataframe, I am new to PySpark and DataBricks – Shruti Nov 09 '21 at 16:45
we can get the required details by phrasing and write as list in python and convert the python list into dataframe . You ref : https://stackoverflow.com/questions/48448473/pyspark-convert-a-standard-list-to-data-frame – Karthikeyan Rasipalay Durairaj Nov 09 '21 at 16:51
schema = StructType([ \ StructField("object_type",StringType(),True), \ StructField("path",StringType(),True), \ StructField("language",StringType(),True), \ StructField("object_id", StringType(), True), \ ]) df = spark.createDataFrame(data=data1,schema=schema) df = spark.createDataFrame(data=data1,schema=schema) Any clue? – Shruti Nov 10 '21 at 10:43

score 0 · Answer 2 · answered Jun 23 '23 at 04:40

0

You could add the snippet below to automatically parse the json response and put it into a Pandas dataframe:

import pandas as pd

pd.json_normalize(response["jobs"])

answered Jun 23 '23 at 04:40

David Smith

78
4

List all notebooks, jobs in databricks and load resultset into a dataframe and a managed table

2 Answers2