Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions

votes

1 answer

Databricks - Download a dbfs:/FileStore file to my Local Machine

Normally I use below URL to download file from Databricks DBFS FileStore to my local computer. *https:///fileStore/?o=* However, this time the file is not downloaded and the URL lead me to…

databricks azure-databricks

asked Mar 18 '21 at 06:11

PJT

votes

1 answer

How to create a empty folder in Azure Blob from Azure databricks

I have scenario where I want to list all the folders inside a directory in Azure Blob. If no folders present create a new folder with certain name. I am trying to list the folders using dbutils.fs.ls(path). But the problem with the above command is…

azure databricks azure-blob-storage azure-databricks

asked Jun 24 '20 at 13:09

Saikat

votes

4 answers

databricks: check if the mountpoint already mounted

How to check if the mount point is already mounted before mount in databricks python ?? dbutils.fs.mount Thanks

python azure databricks azure-databricks

asked Oct 23 '19 at 02:24

mytabi

votes

2 answers

Creating a Secret Scope in Databricks backed by Azure Key Vault fails

You can create scopes in Databricks backed by Azure Keyvault instead of using the Databricks CLI. However, when you try to create a Scope, an obscure error message (with a spelling mistake!) is shown. It appears as not many people encounter this…

azure azure-keyvault azure-databricks

asked Jun 11 '19 at 05:46

Rodney

5,417
7
54
98

votes

1 answer

Error running Spark on Databricks: constructor public XXX is not whitelisted

I was using Azure Databricks and trying to run some example python code from this page. But I get this exception: py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.classification.LogisticRegression(java.lang.String) is not…

apache-spark pyspark databricks azure-databricks whitelist

asked Mar 30 '19 at 03:05

lidong

votes

2 answers

NoSuchMethodError on com.fasterxml.jackson.dataformat.xml.XmlMapper.coercionConfigDefaults()

I'm parsing a XML string to convert it to a JsonNode in Scala using a XmlMapper from the Jackson library. I code on a Databricks notebook, so compilation is done on a cloud cluster. When compiling my code I got this error…

scala jackson databricks azure-databricks xmlmapper

asked Oct 07 '21 at 11:38

Karzyfox

votes

4 answers

Import python module to python script in databricks

I am working on a project in Azure DataFactory, and I have a pipeline that runs a Databricks python script. This particular script, which is located in the Databricks file system and is run by the ADF pipeline, imports a module from another python…

python azure-pipelines azure-data-factory databricks azure-databricks

asked May 28 '21 at 13:50

Cristian Ispan

votes

4 answers

How to add a new column to a Delta Lake table?

I'm trying to add a new column to data stored as a Delta Table in Azure Blob Storage. Most of the actions being done on the data are upserts, with many updates and few new inserts. My code to write data currently looks like…

scala apache-spark databricks azure-databricks delta-lake

asked Aug 21 '20 at 19:07

Comrade_Question

votes

2 answers

A schema mismatch detected when writing to the Delta table - Azure Databricks

I try to load "small_radio_json.json" to Delta Lake table. After this code I would create table. I try create Delta table but getting error "A schema mismatch detected when writing to the Delta table." It may be related to partition of the …

scala azure-databricks delta-lake

asked Mar 29 '20 at 14:00

Kenny_I

2,001
5
40
94

votes

0 answers

Databricks 6.1 no database named global_temp error when initializing metastore connection

When initializing hive metastore connection (saving data frame as a table for the first time ) on cluster 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (Azure), I can see health check for database global_temp failing with the error: 20/02/18…

azure databricks azure-databricks

asked Feb 18 '20 at 12:16

Marcin

votes

1 answer

Apache Spark: impact of repartitioning, sorting and caching on a join

I am exploring Spark's behavior when joining a table to itself. I am using Databricks. My dummy scenario is: Read an external table as dataframe A (underlying files are in delta format) Define dataframe B as dataframe A with only certain columns…

apache-spark pyspark bigdata azure-databricks delta-lake

asked Jan 03 '20 at 10:28

Dawid

votes

3 answers

List All Files in a Folder Sitting in a Data Lake

I'm trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I'm testing. import sys, os import pandas as pd mylist = [] root = "/mnt/rawdata/parent/" path =…

python scala databricks azure-data-lake azure-databricks

asked Nov 07 '19 at 14:41

ASH

20,759
19
87
200

votes

2 answers

Pyspark User-Defined_functions inside of a class

I am trying to create a Spark-UDF inside of a python class. Meaning, one of the methods in a class is the UDF. I am getting an error named " PicklingError: Could not serialize object: TypeError: can't pickle _MovedItems objects " Environment :…

python-3.x pyspark jupyter-notebook azure-databricks

asked Oct 16 '19 at 15:10

Chinivar Basu

votes

4 answers

Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception : AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to…

databricks azure-databricks delta-lake

asked Jul 16 '19 at 08:21

Themis

votes

7 answers

How to check a file/folder is present using pyspark without getting exception

I am trying to keep a check for the file whether it is present or not before reading it from my pyspark in databricks to avoid exceptions? I tried below code snippets but i am getting exception when file is not present from pyspark.sql import * from…

pyspark azure-databricks

asked Apr 09 '19 at 09:53

Amareshwar Reddy

Prev 1

…

99 100 Next