Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about apache-spark or public Spark packages maintained by Databricks (like spark-csv).

Related tags:

7135 questions

votes

1 answer

How to print/log outputs within foreachBatch function?

Using table streaming, I am trying to write stream using foreachBatch df.writestream .format("delta") .foreachBatch(WriteStreamToDelta) ... WriteStreamToDelta looks like def WriteStreamToDelta(microDF, batch_id): microDFWrangled =…

apache-spark databricks spark-structured-streaming

asked Jun 09 '22 at 12:20

kkarthick12

votes

1 answer

Casting from timestamp[us, tz=Etc/UTC] to timestamp[ns] would result in out of bounds timestamp

I have a feature which let's me query a databricks delta table from a client app. This is the code I use for that purpose: df = spark.sql('SELECT * FROM EmployeeTerritories LIMIT 100') dataframe = df.toPandas() dataframe_json =…

pandas apache-spark pyspark apache-spark-sql databricks

asked Dec 29 '21 at 12:33

anthino12

votes

2 answers

NoSuchMethodError on com.fasterxml.jackson.dataformat.xml.XmlMapper.coercionConfigDefaults()

I'm parsing a XML string to convert it to a JsonNode in Scala using a XmlMapper from the Jackson library. I code on a Databricks notebook, so compilation is done on a cloud cluster. When compiling my code I got this error…

scala jackson databricks azure-databricks xmlmapper

asked Oct 07 '21 at 11:38

Karzyfox

votes

4 answers

How to pass the script path to %run magic command as a variable in databricks notebook?

I want to run a notebook in databricks from another notebook using %run. Also I want to be able to send the path of the notebook that I'm running to the main notebook as a parameter. The reason for not using dbutils.notebook.run is that I'm storing…

python pyspark jupyter-notebook databricks

asked Aug 22 '21 at 23:44

ARCrow

1,360
1
10
26

votes

4 answers

Import python module to python script in databricks

I am working on a project in Azure DataFactory, and I have a pipeline that runs a Databricks python script. This particular script, which is located in the Databricks file system and is run by the ADF pipeline, imports a module from another python…

python azure-pipelines azure-data-factory databricks azure-databricks

asked May 28 '21 at 13:50

Cristian Ispan

votes

2 answers

check if delta table exists on a path or not in databricks

I need to delete certain data from a delta-lake table before I load it. I am able to delete the data from delta table if it exists but it fails when the table does not exist. Databricks scala code below // create delete statement val del_ID =…

scala databricks delta-lake

asked Oct 06 '20 at 16:39

VNK

votes

4 answers

How to add a new column to a Delta Lake table?

I'm trying to add a new column to data stored as a Delta Table in Azure Blob Storage. Most of the actions being done on the data are upserts, with many updates and few new inserts. My code to write data currently looks like…

scala apache-spark databricks azure-databricks delta-lake

asked Aug 21 '20 at 19:07

Comrade_Question

votes

2 answers

Do you know how to install the 'ODBC Driver 17 for SQL Server' on a Databricks cluster?

I'm trying to connect from a Databricks notebook to an Azure SQL Datawarehouse using the pyodbc python library. When I execute the code I get this error: Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL…

python-3.x pyodbc databricks azure-synapse

asked Apr 04 '20 at 01:50

user2364105

votes

1 answer

When should I use Azure ML Notebooks VS Azure Databricks? Both are competitor products in my opinion

Pretty self-explanatory question. When should I use Azure ML Notebooks VS Azure Databricks? I feel there’s a great overlap between the two products and one is definitely better marketed than the other.. I’m mainly looking for information concerning…

azure machine-learning databricks azure-machine-learning-service

asked Apr 01 '20 at 19:25

dernat71

votes

0 answers

Databricks 6.1 no database named global_temp error when initializing metastore connection

When initializing hive metastore connection (saving data frame as a table for the first time ) on cluster 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (Azure), I can see health check for database global_temp failing with the error: 20/02/18…

azure databricks azure-databricks

asked Feb 18 '20 at 12:16

Marcin

votes

3 answers

List All Files in a Folder Sitting in a Data Lake

I'm trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I'm testing. import sys, os import pandas as pd mylist = [] root = "/mnt/rawdata/parent/" path =…

python scala databricks azure-data-lake azure-databricks

asked Nov 07 '19 at 14:41

ASH

20,759
19
87
200

votes

1 answer

Storage options in databricks

I am relatively new to databricks environment. My company has set up a databricks account for me where I am pulling data from a s3 bucket. I have background in traditional relational databases so it's a bit difficult for me to understand…

apache-spark amazon-s3 pyspark apache-spark-sql databricks

asked Aug 21 '19 at 06:31

user11704694

votes

4 answers

Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception : AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to…

databricks azure-databricks delta-lake

asked Jul 16 '19 at 08:21

Themis

votes

2 answers

How to convert a spark dataframe into a databrick koalas dataframe?

I know that you can convert a spark dataframe df into a pandas dataframe with df.toPandas() However, this is taking very long, so I found out about a koala package in databricks that could enable me to use the data as a pandas dataframe (for…

python-3.x dataframe databricks

asked Jun 21 '19 at 15:58

Antonio López Ruiz

1,396
5
20
36

votes

4 answers

How to set jdbc/partitionColumn type to Date in spark 2.4.1

I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below : .option("lowerBound", "31-MAR-02"); .option("upperBound", "01-MAY-19"); .option("partitionColumn", "data_date"); …

apache-spark apache-spark-sql databricks

asked May 03 '19 at 08:27

BdEngineer

2,929
4
49
85

Prev 1 2 3

…

99 100 Next