Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
8
votes
3 answers

Remove Files from Directory after uploading in Databricks using dbutils

A very clever person from StackOverflow assisted me in copying files to a directory from Databricks here: copyfiles I am using the same principle to remove the files once it has been copied as shown in the link: for i in range (0, len(files)): …
Carltonp
  • 1,166
  • 5
  • 19
  • 39
7
votes
2 answers

Unity catalog not enabled on cluster in Databricks

We are trying out Unity catalog in Azure Databricks. We connected a pre-existing workspace to the new metastore. I created a new catalog. When I run a notebook and try to write to table "myfirstcatalog.bronze.mytable" I get the…
7
votes
3 answers

How to create a Databricks job using a Python file outside of dbfs?

I am fairly new to Databricks, so forgive me for the lack of knowledge here. I am using the Databricks resource in Azure. I mainly use the UI right now, but I know some features are only available using databricks-cli, which I have setup but not…
7
votes
3 answers

Execute multiple notebooks in parallel in pyspark databricks

Question is simple: master_dim.py calls dim_1.py and dim_2.py to execute in parallel. Is this possible in databricks pyspark? Below image is explaning what am trying to do, it errors for some reason, am i missing something here?
7
votes
2 answers

How to get output parameter from Executed Pipeline in ADF?

I have a databricks pipeline that will give an output, but at the moment, I need run the databricks from the Executed Pipelines, when I tried to run it, my databricks output didn't show up on Executed Pipelines ? Is this pipeline can't show the…
MADFROST
  • 1,043
  • 2
  • 11
  • 29
7
votes
4 answers

Databricks Cluster terminated. Reason: Cloud Provider Launch Failure

I'm using Azure Databricks with a custom configuration that uses vnet injection and I am unable to start a cluster in my workspace. The error message being given is not documented anywhere in microsoft or databricks documentation meaning I am unable…
Abhishek Sharma
  • 109
  • 1
  • 8
7
votes
2 answers

AttributeError: 'DataFrame' object has no attribute '_data'

Azure Databricks execution error while parallelizing on pandas dataframe. The code is able to create RDD but breaks at the time of performing .collect() setup: import pandas as pd # initialize list of lists data = [['tom', 10], ['nick', 15],…
7
votes
3 answers

How to throw Exception in Databricks?

I want my Databricks notebook to fail if a certain condition is satisfied. Right now I am using dbutils.notebook.exit() but it does not cause the notebook to fail and I will get mail like notebook run is successful. How can I make my notebook fail?
Shubham Sahay
  • 113
  • 1
  • 2
  • 8
7
votes
2 answers

Databricks notebooks crashes on memory job

I am running few operations to aggregate a big quantity of data (about 600gb) on azure databricks. I noticed recently that the notebook crashes and the databricks returns the error below. The same code worked before with smaller 6 nodes cluster.…
KLA
  • 31
  • 1
  • 8
7
votes
0 answers

Databricks Widget Panel Default Settings

I have a number of notebooks which have widgets and currently the default setting for the 'On Widget Change' is 'Run Accessed Commands'. Is there any way of globally setting this to 'Do Nothing'. I can do this on an individual notebook, but if I…
RG0107
  • 111
  • 10
7
votes
2 answers

How to execute a stored procedure in Azure Databricks PySpark?

I am able to execute a simple SQL statement using PySpark in Azure Databricks but I want to execute a stored procedure instead. Below is the PySpark code I tried. #initialize pyspark import…
Ajay
  • 247
  • 1
  • 5
  • 15
7
votes
3 answers

Overwrite Databricks Dependency

In our project we're using com.typesafe:config in version 1.3.4. According to the latest release notes, this dependency is already provided by Databricks on the cluster, but in a very old version (1.2.1). How can I overwrite the provided dependency…
pgruetter
  • 1,184
  • 1
  • 11
  • 29
7
votes
2 answers

Azure Databricks: How to add Spark configuration in Databricks cluster

I am using a Spark Databricks cluster and want to add a customized Spark configuration. There is a Databricks documentation on this but I am not getting any clue how and what changes I should make. Can someone pls share the example to configure the…
Stark
  • 604
  • 3
  • 11
  • 30
7
votes
3 answers

How to get the last modification time of each files present in azure datalake storage using python in databricks workspace?

I am trying to get the last modification time of each file present in azure data lake. files = dbutils.fs.ls('/mnt/blob') for fi in files: print(fi) Output:-FileInfo(path='dbfs:/mnt/blob/rule_sheet_recon.xlsx', name='rule_sheet_recon.xlsx',…
7
votes
1 answer

what is the cluster manager used in Databricks ? How do I change the number of executors in Databricks clusters?

What is the cluster manager used in Databricks? How do I change the number of executors in Databricks clusters ?
prady
  • 563
  • 4
  • 9
  • 24