Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
12
votes
1 answer

Databricks - Download a dbfs:/FileStore file to my Local Machine

Normally I use below URL to download file from Databricks DBFS FileStore to my local computer. *https:///fileStore/?o=* However, this time the file is not downloaded and the URL lead me to…
PJT
  • 185
  • 1
  • 1
  • 9
12
votes
1 answer

How to create a empty folder in Azure Blob from Azure databricks

I have scenario where I want to list all the folders inside a directory in Azure Blob. If no folders present create a new folder with certain name. I am trying to list the folders using dbutils.fs.ls(path). But the problem with the above command is…
Saikat
  • 403
  • 1
  • 7
  • 19
12
votes
4 answers

databricks: check if the mountpoint already mounted

How to check if the mount point is already mounted before mount in databricks python ?? dbutils.fs.mount Thanks
mytabi
  • 639
  • 2
  • 12
  • 28
11
votes
2 answers

Creating a Secret Scope in Databricks backed by Azure Key Vault fails

You can create scopes in Databricks backed by Azure Keyvault instead of using the Databricks CLI. However, when you try to create a Scope, an obscure error message (with a spelling mistake!) is shown. It appears as not many people encounter this…
Rodney
  • 5,417
  • 7
  • 54
  • 98
11
votes
1 answer

Error running Spark on Databricks: constructor public XXX is not whitelisted

I was using Azure Databricks and trying to run some example python code from this page. But I get this exception: py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.classification.LogisticRegression(java.lang.String) is not…
lidong
  • 556
  • 1
  • 4
  • 20
10
votes
2 answers

NoSuchMethodError on com.fasterxml.jackson.dataformat.xml.XmlMapper.coercionConfigDefaults()

I'm parsing a XML string to convert it to a JsonNode in Scala using a XmlMapper from the Jackson library. I code on a Databricks notebook, so compilation is done on a cloud cluster. When compiling my code I got this error…
Karzyfox
  • 319
  • 1
  • 2
  • 15
10
votes
4 answers

Import python module to python script in databricks

I am working on a project in Azure DataFactory, and I have a pipeline that runs a Databricks python script. This particular script, which is located in the Databricks file system and is run by the ADF pipeline, imports a module from another python…
10
votes
4 answers

How to add a new column to a Delta Lake table?

I'm trying to add a new column to data stored as a Delta Table in Azure Blob Storage. Most of the actions being done on the data are upserts, with many updates and few new inserts. My code to write data currently looks like…
10
votes
2 answers

A schema mismatch detected when writing to the Delta table - Azure Databricks

I try to load "small_radio_json.json" to Delta Lake table. After this code I would create table. I try create Delta table but getting error "A schema mismatch detected when writing to the Delta table." It may be related to partition of the …
Kenny_I
  • 2,001
  • 5
  • 40
  • 94
10
votes
0 answers

Databricks 6.1 no database named global_temp error when initializing metastore connection

When initializing hive metastore connection (saving data frame as a table for the first time ) on cluster 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (Azure), I can see health check for database global_temp failing with the error: 20/02/18…
Marcin
  • 284
  • 1
  • 2
  • 14
10
votes
1 answer

Apache Spark: impact of repartitioning, sorting and caching on a join

I am exploring Spark's behavior when joining a table to itself. I am using Databricks. My dummy scenario is: Read an external table as dataframe A (underlying files are in delta format) Define dataframe B as dataframe A with only certain columns…
Dawid
  • 652
  • 1
  • 11
  • 24
10
votes
3 answers

List All Files in a Folder Sitting in a Data Lake

I'm trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I'm testing. import sys, os import pandas as pd mylist = [] root = "/mnt/rawdata/parent/" path =…
ASH
  • 20,759
  • 19
  • 87
  • 200
10
votes
2 answers

Pyspark User-Defined_functions inside of a class

I am trying to create a Spark-UDF inside of a python class. Meaning, one of the methods in a class is the UDF. I am getting an error named " PicklingError: Could not serialize object: TypeError: can't pickle _MovedItems objects " Environment :…
10
votes
4 answers

Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception : AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to…
Themis
  • 139
  • 1
  • 1
  • 8
10
votes
7 answers

How to check a file/folder is present using pyspark without getting exception

I am trying to keep a check for the file whether it is present or not before reading it from my pyspark in databricks to avoid exceptions? I tried below code snippets but i am getting exception when file is not present from pyspark.sql import * from…
Amareshwar Reddy
  • 103
  • 1
  • 1
  • 4