Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
10
votes
0 answers

Spark 2.4.0 - unable to parse ISO8601 string into TimestampType preserving ms

When trying to convert ISO8601 strings with time zone information into a TimestampType using a cast(TimestampType) only strings using the time zone format +01:00 is accepted. If the time zone is defined in the ISO8601 legal way +0100 (without the…
Molotch
  • 365
  • 7
  • 20
9
votes
4 answers

List databricks secret scope and find referred keyvault in azure databricks

How can we find existing secret scopes in databricks workspace. And which keyvault is referred by specific SecretScope in Azure Databricks?
tikiabbas
  • 119
  • 2
  • 3
  • 11
9
votes
2 answers

Azure Databricks OOM error that causes the connection to the Python REPL to be closed

In the following sample code, in one cell of our Azure Databricks notebook, the code loads about 20 million records into a Python pandas dataframe from an Azure SQL db, does some dataframe column tranformation by applying some functions (as shown in…
nam
  • 21,967
  • 37
  • 158
  • 332
9
votes
3 answers

Error: Invalid configuration value detected for fs.azure.account.key

I am using Azure Databricks to make a delta table in Azure Blob Storage using ADLS Gen2 but i am getting the error "Failure to initialize configurationInvalid configuration value detected for fs.azure.account.key" on last…
Nabia Salman
  • 552
  • 1
  • 8
  • 29
9
votes
2 answers

Delta table merge on multiple columns

i have a table which has primary key as multiple columns so I need to perform the merge logic on multiple columns DeltaTable.forPath(spark, "path") .as("data") .merge( finalDf1.as("updates"), "data.column1 = updates.column1 AND…
Tony
  • 301
  • 3
  • 10
9
votes
2 answers

Why is Pandas UDF not being parallelized?

I have data from many IoT sensors. For each particular sensor, there's only about 100 rows in the dataframe: the data is not skewed. I'm training an individual machine learning model for each sensor. I'm using pandas udf successfully to train and…
marcus
  • 91
  • 1
  • 5
9
votes
3 answers

What is a good Databricks workflow

I'm using Azure Databricks for data processing, with notebooks and pipeline. I'm not satisfied with my current workflow: The notebook used in production can't be modified without breaking the production. When I want to develop an update, I…
Be Chiller Too
  • 2,502
  • 2
  • 16
  • 42
9
votes
2 answers

When to use a UDF versus a function in PySpark?

I'm using Spark with Databricks and have the following code: def replaceBlanksWithNulls(column): return when(col(column) != "", col(column)).otherwise(None) Both of these next statements work: x = rawSmallDf.withColumn("z",…
8
votes
2 answers

Create Azure Key Vault backed secret scope in Databricks with AAD Token

My ultimate goal is to mount ADLS gen2 containers into my Databricks workspace as part of my Terraform-managed deployment under the auspices of an Azure Service Principal. This is a single deployment that creates all the Azure resources (networking,…
8
votes
0 answers

Connecting from Azure Databricks to Azure SQL using User Managed Identity

I am trying to read data on an Azure SQL instance from an Azure Databricks workspace, avoiding using username/password personal credentials for automated, regular data fetch & analysis. I thought using a managed identity would do the job, however it…
8
votes
1 answer

databricks cli: getting b'Bad request error

I am trying to use Databricks CLI for the first time. Whenever I try something using cli it gives me the message: "Error: b'Bad Request'" This is same for any cli based command I am able to do authenticate (Tried with a wrong token and got the…
8
votes
3 answers

Python Version in Azure Databricks

I am trying to find out the python version I am using in Databricks. To find out I tried import sys print(sys.version) And I got the output as 3.7.3 However when I went to Cluster --> SparkUI --> Environment I see that the cluster Python version…
learner
  • 833
  • 3
  • 13
  • 24
8
votes
0 answers

Write dataframe to blob using azure databricks

Is there any link or sample code where we can write dataframe to azure blob storage using python (not using pyspark module).
8
votes
2 answers

How to install a library on a databricks cluster using some command in the notebook?

Actually I want to install a library on my Azure databricks cluster but I cannot use the UI method because every time my cluster would change and in transition I cannot add library to it using UI. Is there any databricks utility command for doing…
Samyak Jain
  • 155
  • 1
  • 2
  • 8
8
votes
3 answers

mount error when trying to access the Azure DBFS file system in Azure Databricks

I'm able to establish a connection to my Databricks FileStore DBFS and access the filestore. Reading, writing, and transforming data with Pyspark is possible but when I try to use a local Python API such as pathlib or the OS module I am unable to…
Umar.H
  • 22,559
  • 7
  • 39
  • 74