Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions

vote

0 answers

Messages not loading into SilverTable from Topic

Trying to load messages from Topic into a silverTable in the WriteStream. But the messages are not loading into silverTable. How to read the messages into silverTable? var df = spark .readStream .format("kafka") …

asked Dec 17 '21 at 05:42

codes_nipe

vote

1 answer

Azure Databricks stream fails with StorageException: Could not verify copy source

We have a Databricks job that has suddenly started to consistently fail. Sometimes it runs for an hour, other times it fails after a few minutes. The inner exception is ERROR MicroBatchExecution: Query [id = xyz, runId = abc] terminated with…

azure-blob-storage azure-databricks spark-structured-streaming

asked Dec 16 '21 at 13:45

Benny Hjertaas

vote

1 answer

Databricks, comparing two tables to see which records are missing

I'm looking into two tables that are supposed to be equal. I run this query to see which records are missing in table B against table A (we have a 3-columned key): select * from tableA A left join TableB B on A.joinField1 = B.joinField1 and…

sql databricks azure-databricks databricks-sql

asked Dec 15 '21 at 15:25

Riccardo Lamera

vote

2 answers

Databricks: how to find the source of mounted point?

The following command returns a list of mounted point of Databricks: dbutils.fs.ls("/mnt/") Let's assume the "/mnt/point_name/" point exists. How check to with source the point is connected? E.g. How to find a relation between Azure Storage Account…

databricks azure-databricks

asked Dec 15 '21 at 13:16

skolukmar

vote

0 answers

Cannot connect Visual Studio with Azure Databricks cluster: "Missing Python executable 'python3'

I am trying to use our Azure Databricks clusters in Visual Studio running on a virtual machine. I am following the steps described here 1. Setup cluster I set up a cluster with runtime 9.1 and specify the advanced options as should. The port I set…

python visual-studio-2010 databricks azure-databricks databricks-connect

asked Dec 14 '21 at 19:23

user3387899

vote

0 answers

Duplicate dataset with millions of rows using pyspark

I am trying to duplicate a dataset which has 30 rows to around 600 Million rows. I am currently using a for loop to iterate and perform union but it is taking a lot of time. Is there any better way to create duplicate rows in pyspark to this huge…

python dataframe apache-spark pyspark azure-databricks

asked Dec 14 '21 at 09:47

Samyak Jain

vote

1 answer

How to create Azure Databricks Notebook via Terraform?

So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure: resource "azurerm_databricks_workspace" "bdcc" { depends_on = [ azurerm_resource_group.bdcc ] name =…

terraform azure-databricks terraform-provider-databricks

asked Dec 13 '21 at 10:58

Vlad Vlad

vote

1 answer

Unable to build Spark application with multiple main classes for Databricks job

I have a spark application that contains multiple spark jobs to be run on Azure data bricks. I want to build and package the application into a fat jar. The application is able to compile successfully. While I am trying to package (command: sbt…

apache-spark sbt databricks azure-databricks spark-jobserver

asked Dec 11 '21 at 10:27

Dharita Chokshi

1,133
3
16
39

vote

1 answer

How can i make my Spark Accumulator statistics reliable in Azure Databricks?

I am using a spark accumulator to collect statistics of each pipelines. In a typical pipeline i would read a data_frame : df = spark.read.format(csv).option("header",'true').load('/mnt/prepared/orders') df.count() ==> 7 rows Then i would actually…

apache-spark pyspark databricks azure-databricks accumulator

asked Dec 08 '21 at 23:24

OrganicMustard

1,158
1
15
36

vote

1 answer

Databrick pyspark Error While getting Excel data from my Azure Blob Storage

I want to read an excel file with multiple sheets in my Blob storage Azure Gen2 using Databrick pyspark. I already install the maven package. Below my code : df = spark.read.format('com.crealytics.spark.excel') \ .option("header", "true")…

pyspark azure-databricks

asked Dec 08 '21 at 15:48

MFatn

vote

2 answers

Use a CSV file of dates as triggers for ADF pipeline

I have an ADF pipeline that I need to run based on a csv file containing sporadic dates. Is there anyway to implement this? my only thought is to trigger a pipeline daily that has a databricks script that checks if the current date matches a date in…

azure-pipelines azure-data-factory databricks azure-databricks azure-triggers

asked Dec 06 '21 at 10:18

Alex Grimshaw

vote

1 answer

Python Databricks cannot visualise dtreeviz decision tree

I need to visualize a decision tree in dtreeviz in Databricks. The code seems to be working fine. However, instead of showing the decision tree it throws the following: Out[23]: Running the following…

python databricks azure-databricks dtreeviz

asked Dec 06 '21 at 05:08

Dario Federici

1,228
2
18
40

vote

1 answer

I am unable to mount ADLS Gen2. Please assist

Code to mount ADLS Gen2: Error while mounting ADLS Gen2:

apache-spark azure-databricks azure-data-lake-gen2

asked Dec 02 '21 at 05:55

Robb

vote

2 answers

SparkR::dapply library not recognized

Introduction: I've installed some packages on a Databricks cluster using install.packages on DR 9.1 LTS, and I want to run a UDF using R & Spark (SparkR or sparklyr). My use case is to score some data in batch using Spark (either SparkR or…

apache-spark databricks azure-databricks sparkr

asked Dec 01 '21 at 17:57

yeamusic21

vote

2 answers

FileUtils write method does not work on Azure Databricks

I have troubles writing a file on my Databricks cluster's driver (as a temp file). I have a scala notebook on my company's Azure Databricks which contains those lines of code : val xml: String = Controller.requestTo(url) val bytes: Array[Byte] =…

azure scala databricks azure-databricks fileutils

asked Dec 01 '21 at 15:36

Karzyfox

Prev 1 2 3

…

99 100 Next