Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
1
vote
1 answer

Read files from multiple folders from ADLS gen2 storage via databricks and create single target file

I'm using databricks service for analysis. I have built a connection with ADLS gen2 storage and create a mountpoint and now that container contains multiple folder for years and months and having parquet files for each month inside month folders. I…
Arun
  • 57
  • 7
1
vote
1 answer

Azure Databricks API

Trying to use the Databricks API to work with resources programmatically. I am using this microsoft documentto authenticate with a service…
1
vote
1 answer

Databricks Notebook as Substitute for livy sessions endpoint

I want to execute a Databrikcs Notebook's code via Databricks API and get the output of notebook's code as response. Is it possible of is there any workaround for the same ? Is the same possible with Databricks SQL api ?
1
vote
0 answers

Databricks is "Updating the Delta table's state"

I'm reading and joining multiple delta tables from a Datalake and store the result back to another Deltalake location. When doing so, Databricks is showing me : Depending on how many delta tables I join with each other, this can take up to very…
elyptikus
  • 936
  • 8
  • 24
1
vote
1 answer

DataType issue from Synapse to Delta table in Databricks?

Copying the data from Synapse to Managed Delta table. We enabled the staging and copied data from synapse to managed delta table. We have some of the date columns in the synapse and same schema defined in the delta table. we have designed the…
1
vote
2 answers

Databricks - readstream from delta table writestream to orc file only with changes

A pipeline runs every 20 minutes pushing data to ADLS Gen2 storage in ORC format. I have an Azure Databricks notebook job which runs every 1 hour. This job reads the orc file from ADLS as structured stream (orc file created by pipeline mentioned…
Tim
  • 1,321
  • 1
  • 22
  • 47
1
vote
0 answers

Create a function in Databricks notebook for spark.sql

I am trying to create a function in scala that would write a log message to delta table. example def logMessage(message: String): Unit = { spark.sql("INSERT INTO tablename values ('${message}')") } The problem is that spark.sql will return…
Kylo
  • 109
  • 8
1
vote
3 answers

How to read .shp files in Databricks notebook

I am working on a problem where I need to plot output on a Map. In past I was able to do it using geopandas. However this does not work in databricks-notebook. I tried to look at alternative but couldn't find any on web. Pages I looked in…
Ajay Verma
  • 610
  • 2
  • 12
1
vote
1 answer

Faster write to MySQL using databricks write

I am currently working on a azure date bricks notebook that read files from a storage container into a data frame and then writes all the records to a table in MySQL. The file can have anywhere from 2 million to 10 million rows. I have the…
JS noob
  • 429
  • 5
  • 14
1
vote
1 answer

Pass Typesafe config file to the Spark Submit Job in Azure Databricks

I am trying to pass a Typesafe config file to the spark submit task and print the details in the config file. import org.slf4j.{Logger, LoggerFactory} import com.typesafe.config.{Config, ConfigFactory} import org.apache.spark.sql.SparkSession …
1
vote
0 answers

How to insert data into delta table with changing schema

In databricks Scala, I'm exploding a Map column and loading it into a delta table. I have a predefined schema of the delta table. Let's say the schema has 4 columnns A,B,C,D. So,on day 1 Im loading my dataframe with 4 columns into the delta table…
SanjanaSanju
  • 261
  • 2
  • 18
1
vote
1 answer

Spark Dataframe lambda on dataframe directly

I see so many example which need to use lambda over a rdd.map . just wonder if we can do something like the following : df.withColumn('newcol',(lambda x: x['col1'] + x['col2'])).show()
mytabi
  • 639
  • 2
  • 12
  • 28
1
vote
0 answers

Read JSON file in pyspark to create schema struct type in python

This is in a Microsoft Azure data lake running on azure databricks. I'm trying to read a JSON file, that I did not create, which has the schema, or name and type information for CSV's that I can read, but have no header in the CSV. df1 =…
1
vote
1 answer

How to create a PySpark pandas-on-Spark DataFrame from Snowflake SQL query?

NOTE: Need to use distributed processing, which is why I am utilizing Pandas API on Spark. To create the pandas-on-Spark DataFrame, I attempted 2 different methods (outlined below: "OPTION 1", "OPTION 2"). Are either of these options feasible? If…
1
vote
1 answer

(Databricks SQL) Date stored as STRING (yyyyMMDD) cannot convert it into DATE (yyyy-MM-DD)

I've spent last 3 hours googling the issue but nothing seems to work in Databricks SQL version specifically. I have to use a database, where someone decided it's best to store date as a STRING, there's no way around it short-term. Current date…