Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
1
vote
0 answers

Messages not loading into SilverTable from Topic

Trying to load messages from Topic into a silverTable in the WriteStream. But the messages are not loading into silverTable. How to read the messages into silverTable? var df = spark .readStream .format("kafka") …
1
vote
1 answer

Azure Databricks stream fails with StorageException: Could not verify copy source

We have a Databricks job that has suddenly started to consistently fail. Sometimes it runs for an hour, other times it fails after a few minutes. The inner exception is ERROR MicroBatchExecution: Query [id = xyz, runId = abc] terminated with…
1
vote
1 answer

Databricks, comparing two tables to see which records are missing

I'm looking into two tables that are supposed to be equal. I run this query to see which records are missing in table B against table A (we have a 3-columned key): select * from tableA A left join TableB B on A.joinField1 = B.joinField1 and…
Riccardo Lamera
  • 105
  • 1
  • 13
1
vote
2 answers

Databricks: how to find the source of mounted point?

The following command returns a list of mounted point of Databricks: dbutils.fs.ls("/mnt/") Let's assume the "/mnt/point_name/" point exists. How check to with source the point is connected? E.g. How to find a relation between Azure Storage Account…
skolukmar
  • 185
  • 1
  • 9
1
vote
0 answers

Cannot connect Visual Studio with Azure Databricks cluster: "Missing Python executable 'python3'

I am trying to use our Azure Databricks clusters in Visual Studio running on a virtual machine. I am following the steps described here 1. Setup cluster I set up a cluster with runtime 9.1 and specify the advanced options as should. The port I set…
1
vote
0 answers

Duplicate dataset with millions of rows using pyspark

I am trying to duplicate a dataset which has 30 rows to around 600 Million rows. I am currently using a for loop to iterate and perform union but it is taking a lot of time. Is there any better way to create duplicate rows in pyspark to this huge…
Samyak Jain
  • 155
  • 1
  • 2
  • 8
1
vote
1 answer

How to create Azure Databricks Notebook via Terraform?

So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure: resource "azurerm_databricks_workspace" "bdcc" { depends_on = [ azurerm_resource_group.bdcc ] name =…
1
vote
1 answer

Unable to build Spark application with multiple main classes for Databricks job

I have a spark application that contains multiple spark jobs to be run on Azure data bricks. I want to build and package the application into a fat jar. The application is able to compile successfully. While I am trying to package (command: sbt…
1
vote
1 answer

How can i make my Spark Accumulator statistics reliable in Azure Databricks?

I am using a spark accumulator to collect statistics of each pipelines. In a typical pipeline i would read a data_frame : df = spark.read.format(csv).option("header",'true').load('/mnt/prepared/orders') df.count() ==> 7 rows Then i would actually…
1
vote
1 answer

Databrick pyspark Error While getting Excel data from my Azure Blob Storage

I want to read an excel file with multiple sheets in my Blob storage Azure Gen2 using Databrick pyspark. I already install the maven package. Below my code : df = spark.read.format('com.crealytics.spark.excel') \ .option("header", "true")…
MFatn
  • 39
  • 2
  • 11
1
vote
2 answers

Use a CSV file of dates as triggers for ADF pipeline

I have an ADF pipeline that I need to run based on a csv file containing sporadic dates. Is there anyway to implement this? my only thought is to trigger a pipeline daily that has a databricks script that checks if the current date matches a date in…
1
vote
1 answer

Python Databricks cannot visualise dtreeviz decision tree

I need to visualize a decision tree in dtreeviz in Databricks. The code seems to be working fine. However, instead of showing the decision tree it throws the following: Out[23]: Running the following…
Dario Federici
  • 1,228
  • 2
  • 18
  • 40
1
vote
1 answer

I am unable to mount ADLS Gen2. Please assist

Code to mount ADLS Gen2: Error while mounting ADLS Gen2:
1
vote
2 answers

SparkR::dapply library not recognized

Introduction: I've installed some packages on a Databricks cluster using install.packages on DR 9.1 LTS, and I want to run a UDF using R & Spark (SparkR or sparklyr). My use case is to score some data in batch using Spark (either SparkR or…
yeamusic21
  • 276
  • 3
  • 11
1
vote
2 answers

FileUtils write method does not work on Azure Databricks

I have troubles writing a file on my Databricks cluster's driver (as a temp file). I have a scala notebook on my company's Azure Databricks which contains those lines of code : val xml: String = Controller.requestTo(url) val bytes: Array[Byte] =…
Karzyfox
  • 319
  • 1
  • 2
  • 15