Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about or public Spark packages maintained by Databricks (like ).

Related tags:

7135 questions
10
votes
1 answer

How to print/log outputs within foreachBatch function?

Using table streaming, I am trying to write stream using foreachBatch df.writestream .format("delta") .foreachBatch(WriteStreamToDelta) ... WriteStreamToDelta looks like def WriteStreamToDelta(microDF, batch_id): microDFWrangled =…
10
votes
1 answer

Casting from timestamp[us, tz=Etc/UTC] to timestamp[ns] would result in out of bounds timestamp

I have a feature which let's me query a databricks delta table from a client app. This is the code I use for that purpose: df = spark.sql('SELECT * FROM EmployeeTerritories LIMIT 100') dataframe = df.toPandas() dataframe_json =…
anthino12
  • 770
  • 1
  • 6
  • 29
10
votes
2 answers

NoSuchMethodError on com.fasterxml.jackson.dataformat.xml.XmlMapper.coercionConfigDefaults()

I'm parsing a XML string to convert it to a JsonNode in Scala using a XmlMapper from the Jackson library. I code on a Databricks notebook, so compilation is done on a cloud cluster. When compiling my code I got this error…
Karzyfox
  • 319
  • 1
  • 2
  • 15
10
votes
4 answers

How to pass the script path to %run magic command as a variable in databricks notebook?

I want to run a notebook in databricks from another notebook using %run. Also I want to be able to send the path of the notebook that I'm running to the main notebook as a parameter. The reason for not using dbutils.notebook.run is that I'm storing…
ARCrow
  • 1,360
  • 1
  • 10
  • 26
10
votes
4 answers

Import python module to python script in databricks

I am working on a project in Azure DataFactory, and I have a pipeline that runs a Databricks python script. This particular script, which is located in the Databricks file system and is run by the ADF pipeline, imports a module from another python…
10
votes
2 answers

check if delta table exists on a path or not in databricks

I need to delete certain data from a delta-lake table before I load it. I am able to delete the data from delta table if it exists but it fails when the table does not exist. Databricks scala code below // create delete statement val del_ID =…
VNK
  • 125
  • 1
  • 1
  • 8
10
votes
4 answers

How to add a new column to a Delta Lake table?

I'm trying to add a new column to data stored as a Delta Table in Azure Blob Storage. Most of the actions being done on the data are upserts, with many updates and few new inserts. My code to write data currently looks like…
10
votes
2 answers

Do you know how to install the 'ODBC Driver 17 for SQL Server' on a Databricks cluster?

I'm trying to connect from a Databricks notebook to an Azure SQL Datawarehouse using the pyodbc python library. When I execute the code I get this error: Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL…
user2364105
  • 138
  • 1
  • 1
  • 6
10
votes
1 answer

When should I use Azure ML Notebooks VS Azure Databricks? Both are competitor products in my opinion

Pretty self-explanatory question. When should I use Azure ML Notebooks VS Azure Databricks? I feel there’s a great overlap between the two products and one is definitely better marketed than the other.. I’m mainly looking for information concerning…
10
votes
0 answers

Databricks 6.1 no database named global_temp error when initializing metastore connection

When initializing hive metastore connection (saving data frame as a table for the first time ) on cluster 6.1 (includes Apache Spark 2.4.4, Scala 2.11) (Azure), I can see health check for database global_temp failing with the error: 20/02/18…
Marcin
  • 284
  • 1
  • 2
  • 14
10
votes
3 answers

List All Files in a Folder Sitting in a Data Lake

I'm trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I'm testing. import sys, os import pandas as pd mylist = [] root = "/mnt/rawdata/parent/" path =…
ASH
  • 20,759
  • 19
  • 87
  • 200
10
votes
1 answer

Storage options in databricks

I am relatively new to databricks environment. My company has set up a databricks account for me where I am pulling data from a s3 bucket. I have background in traditional relational databases so it's a bit difficult for me to understand…
user11704694
10
votes
4 answers

Trouble when writing the data to Delta Lake in Azure databricks (Incompatible format detected)

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception : AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to…
Themis
  • 139
  • 1
  • 1
  • 8
10
votes
2 answers

How to convert a spark dataframe into a databrick koalas dataframe?

I know that you can convert a spark dataframe df into a pandas dataframe with df.toPandas() However, this is taking very long, so I found out about a koala package in databricks that could enable me to use the data as a pandas dataframe (for…
Antonio López Ruiz
  • 1,396
  • 5
  • 20
  • 36
10
votes
4 answers

How to set jdbc/partitionColumn type to Date in spark 2.4.1

I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below : .option("lowerBound", "31-MAR-02"); .option("upperBound", "01-MAY-19"); .option("partitionColumn", "data_date"); …
BdEngineer
  • 2,929
  • 4
  • 49
  • 85