Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
1
vote
1 answer

Databricks - overwriteSchema

Multiple times I've had an issue while updating a delta table in Databricks where overwriting the Schema fails the first time, but is then successful the second time. The solution to my problem was to simply run it again, and I'm unable to reproduce…
TonyRyan
  • 148
  • 1
  • 3
  • 8
1
vote
1 answer

How do I import custom libraries in Databricks notebooks?

I uploaded a jar library on my cluster in Databricks following this tutorial, however I have been unable to import the library or use the methods of the library from the Databricks notebook. I have been unable to find forums or documentation that…
D.I.
  • 31
  • 6
1
vote
1 answer

Unable to call Notebook when using scala code in Databricks

I am into a situation where I am able to successfully run the below snippet in azure Databricks from a separate CMD. %run ./HSCModule But running into issues when including that piece of code with other scala code which is importing below packages…
Abhinav26
  • 15
  • 4
1
vote
1 answer

how do you use target data-validator in azure databricks?

I'm trying to run the data validation framework called data-validator created by Target to validate data from a parquet file in Azure databricks. I have created a spark job that will use the data-validator fat jar file. If I give a parameter --help,…
1
vote
1 answer

Latency in Spark streaming job Databricks which sources from an Azure Iot Hub

I have been using a Spark streaming job using Python on Databricks to load sources from an Azure IotHub. However I noticed, when we have a large number of received frames, the job comes long, so we have latency knowing that when we look at the…
1
vote
1 answer

how to copy py file stored in dbfs location to databricks workspace folders

how to copy py file stored in dbfs location to databricks workspace folders. once it is copied to workspace folders. once it is copied to databricsk workspace folders, I can run it as notebook using %run command.
1
vote
1 answer

Import databricks notebook (dynamic content) using workspace api import method

I want to import databricks notebook using workspace api import method. Content of notebook should be dynamic. I am trying using below code but it gives error: malfunctioned request request contains invalid json body. I have tried converting…
1
vote
0 answers

Spark dataframe changes column values when writing on SQL server

I'm facing a very specific problem. I'm working on a pyspark notebook on Databricks. I run the following command: my_df.select("insert_date").distinct().show() and get: +--------------------+ | …
Dario
  • 19
  • 4
1
vote
1 answer

How to connect Azure Data Factory with SQL Endpoints instead of interactive cluster?

Is it possible to connect Azure Data Factory with Azure Databricks SQL Endpoints (Delta table and views) instead of interactive cluster. I tried with Azure delta lake connector but it has options for cluster and not Endpoints?
kaa
  • 21
  • 3
1
vote
1 answer

Azure data bricks spark streaming with autoloader

My source is azure datafactory which is copying files to containerA --> FolderA,FolderB, FolderC. I am using below syntax to use the autoloader need to read the files as it comes to any one of the folder. Mounting I have done till storage account …
1
vote
1 answer

Azure Databricks - Cannot export results from Databricks to blob

I want to export my data from Databricks to Azure blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is my code: %pip install azure.storage.blob %pip install…
Think987
  • 41
  • 3
1
vote
1 answer

RMariaDB on Databricks

I'm trying to get R (either via a notebook or RStudio) to connect to MariaDB on Databricks Azure 10.1. However, whether I add RMariaDB in the Libraries tab of the cluster or via install.packages("RMariaDB") in RStudio I get a failure…
1
vote
2 answers

Filtering on NULL values in a spark dataframe does not work on all columns?

While writing this question, I managed to find an explaination. But as it seems a tricky point, I will post it and answer it anyway. Feel free to complement. I have what appears to me an inconsistent behaviour of pyspark, but as I am quite new to…
1
vote
1 answer

Keyboard shortcuts for Databricks

I wanted to know is there any keyboard shortcut to clear specific cell output in databricks. As of now i can see there is option to hide the result.
Ash3060
  • 188
  • 2
  • 15
1
vote
1 answer

how to find log4j version in azure databricks notebook

Any command to run on "python" notebook of azure databricks environment.I have tried below as per databricks document but it is giving error. import org.apache.logging.log4j.core.Version println(Version.getProductString)
Saswat Ray
  • 141
  • 3
  • 14