Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions

vote

1 answer

Read files from multiple folders from ADLS gen2 storage via databricks and create single target file

I'm using databricks service for analysis. I have built a connection with ADLS gen2 storage and create a mountpoint and now that container contains multiple folder for years and months and having parquet files for each month inside month folders. I…

python azure-databricks azure-data-lake

asked Nov 03 '21 at 19:34

Arun

vote

1 answer

Azure Databricks API

Trying to use the Databricks API to work with resources programmatically. I am using this microsoft documentto authenticate with a service…

azure api databricks azure-databricks databricks-rest-api

asked Nov 03 '21 at 14:29

RealisticMagician

vote

1 answer

Databricks Notebook as Substitute for livy sessions endpoint

I want to execute a Databrikcs Notebook's code via Databricks API and get the output of notebook's code as response. Is it possible of is there any workaround for the same ? Is the same possible with Databricks SQL api ?

azure apache-spark databricks azure-databricks livy

asked Nov 01 '21 at 14:28

Shad Khan

vote

0 answers

Databricks is "Updating the Delta table's state"

I'm reading and joining multiple delta tables from a Datalake and store the result back to another Deltalake location. When doing so, Databricks is showing me : Depending on how many delta tables I join with each other, this can take up to very…

python pyspark databricks azure-databricks delta-lake

asked Nov 01 '21 at 08:40

elyptikus

vote

1 answer

DataType issue from Synapse to Delta table in Databricks?

Copying the data from Synapse to Managed Delta table. We enabled the staging and copied data from synapse to managed delta table. We have some of the date columns in the synapse and same schema defined in the delta table. we have designed the…

pyspark azure-data-factory azure-databricks delta-lake

asked Oct 30 '21 at 13:44

Venkatesh

vote

2 answers

Databricks - readstream from delta table writestream to orc file only with changes

A pipeline runs every 20 minutes pushing data to ADLS Gen2 storage in ORC format. I have an Azure Databricks notebook job which runs every 1 hour. This job reads the orc file from ADLS as structured stream (orc file created by pipeline mentioned…

databricks azure-databricks spark-structured-streaming delta-lake

asked Oct 29 '21 at 22:41

Tim

1,321
1
22
47

vote

0 answers

Create a function in Databricks notebook for spark.sql

I am trying to create a function in scala that would write a log message to delta table. example def logMessage(message: String): Unit = { spark.sql("INSERT INTO tablename values ('${message}')") } The problem is that spark.sql will return…

scala function apache-spark azure-databricks

asked Oct 28 '21 at 10:45

Kylo

vote

3 answers

How to read .shp files in Databricks notebook

I am working on a problem where I need to plot output on a Map. In past I was able to do it using geopandas. However this does not work in databricks-notebook. I tried to look at alternative but couldn't find any on web. Pages I looked in…

python azure-databricks geopandas

asked Oct 28 '21 at 08:21

Ajay Verma

vote

1 answer

Faster write to MySQL using databricks write

I am currently working on a azure date bricks notebook that read files from a storage container into a data frame and then writes all the records to a table in MySQL. The file can have anywhere from 2 million to 10 million rows. I have the…

scala databricks azure-databricks

asked Oct 28 '21 at 01:12

JS noob

vote

1 answer

Pass Typesafe config file to the Spark Submit Job in Azure Databricks

I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file. import org.slf4j.{Logger, LoggerFactory} import com.typesafe.config.{Config, ConfigFactory} import org.apache.spark.sql.SparkSession …

apache-spark databricks azure-databricks spark-submit typesafe-config

asked Oct 27 '21 at 14:06

Praveen Kumar

vote

0 answers

How to insert data into delta table with changing schema

In databricks Scala, I'm exploding a Map column and loading it into a delta table. I have a predefined schema of the delta table. Let's say the schema has 4 columnns A,B,C,D. So,on day 1 Im loading my dataframe with 4 columns into the delta table…

azure scala databricks azure-databricks

asked Oct 26 '21 at 18:51

SanjanaSanju

vote

1 answer

Spark Dataframe lambda on dataframe directly

I see so many example which need to use lambda over a rdd.map . just wonder if we can do something like the following : df.withColumn('newcol',(lambda x: x['col1'] + x['col2'])).show()

pyspark azure-databricks

asked Oct 26 '21 at 04:19

mytabi

vote

0 answers

Read JSON file in pyspark to create schema struct type in python

This is in a Microsoft Azure data lake running on azure databricks. I'm trying to read a JSON file, that I did not create, which has the schema, or name and type information for CSV's that I can read, but have no header in the CSV. df1 =…

python json azure pyspark azure-databricks

asked Oct 25 '21 at 19:21

user1567438

vote

1 answer

How to create a PySpark pandas-on-Spark DataFrame from Snowflake SQL query?

NOTE: Need to use distributed processing, which is why I am utilizing Pandas API on Spark. To create the pandas-on-Spark DataFrame, I attempted 2 different methods (outlined below: "OPTION 1", "OPTION 2"). Are either of these options feasible? If…

apache-spark pyspark apache-spark-sql azure-databricks

asked Oct 25 '21 at 17:03

Max

vote

1 answer

(Databricks SQL) Date stored as STRING (yyyyMMDD) cannot convert it into DATE (yyyy-MM-DD)

I've spent last 3 hours googling the issue but nothing seems to work in Databricks SQL version specifically. I have to use a database, where someone decided it's best to store date as a STRING, there's no way around it short-term. Current date…

sql date databricks azure-databricks string-to-datetime

asked Oct 24 '21 at 16:10

Maciej Kołkowski

Prev 1 2 3

…

99 100 Next