Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
1
vote
1 answer

Overwrite/remove delta table in Azure Databricks after error in writing with null column without type cast

I am using pyspark in Azure Databricks. I had attempted to write a delta table with null column created as follows: df = df.withColumn('val2', funcs.lit(None)) using the following function def write_to_delta_table(df, fnm, tnm, path): …
GreenEye
  • 153
  • 2
  • 14
1
vote
4 answers

How to get all parameters related to a Databricks job run into python?

I am trying to get all parameters related to a Databricks job and import them into python. These parameters should include the date, start time, duration, Status of the job(successful or failed) and all other parameters related to it. I want to use…
1
vote
1 answer

How do I add NULL column to a new table based on a existing delta table while using SQL databricks?

I tried to make a new table from a delta table and adding a new NULL column while using using SQL databricks. Databricks is not able to make a NULL column, if i fill the newly made column it works fine. How do I add NULL column to a new table based…
1
vote
1 answer

How do I efficiently migrate MongoDB to azure CosmosDB with the help of azure Databricks?

While searching for a service to migrate our on-premise MongoDB to Azure CosmosDB with Mongo API, We came across the service called, Azure Data Bricks. We have total of 186GB of data. which we need to migrate to CosmosDB with less downtime as…
1
vote
1 answer

How to ingest data from Eventhub to ADLS using Databricks cluster(Scala)

I'm want to ingest streaming data from Eventhub to ADLS gen2 with specified format. I did for batch data ingestion, from DB to ADLS and Container to Container but now I want to try with streaming data ingestion. Can you please guide me from where to…
1
vote
1 answer

Eventhub Stream not catching schema mismatch

We are trying to implement badRecordsPath when we are reading in events from an eventhub, as an example to try get it working I have put in schema that should fail the event: eventStreamDF = (spark.readStream .format("eventhubs") …
T.UK
  • 65
  • 2
  • 8
1
vote
2 answers

Call Databricks API from DevOps Pipeline using Service principal

I want to be able to call Databricks API from DevOps pipeline. I can do this usint personal access token for my account, however I want to make API calls user independent so I wanted to use Service principal (App registration). I followed this…
1
vote
0 answers

Linked Service from azure Data factory to Databricks: How to parametrize?

I am using new job cluster option while creating linked service from ADF (Data factory) to Databricks with spark configs. I want to parametrize the spark config values as well as keys. I know it's quite easy to parametrize values by referring this…
1
vote
1 answer

Ingest CSV data with Auto Loader with Specific Delimiters / separator

I'm trying to load a several csv files with a complex separator("~|~") The current code currently loads the csv files but is not identifying the correct columns because is using the separator (","). I'm reading the documentation here…
1
vote
0 answers

Access a function from another script in Shared folder in Azure Databricks

I am new to Azure Databricks, and have run into a situation. I have a dev_tools python script in workspace/Shared/dev_tools location. The dev_tools script contains the following code (This is an example and not the actual code). def add (first_num,…
Ashish soni
  • 89
  • 1
  • 10
1
vote
1 answer

Snowflake Table from Databricks using Python/Scala

I want to create a table and load data into it in Snowflake from Databricks using Python/Scala. Below is my code snippet. I'm getting the below error. How can I create the table first if not exists in Databricks notebook using Python or Scala and…
1
vote
1 answer

using confluent kafka-schema-registry-client with basic auth with managed confluent schema registry in databricks

in my spark application I have the following scala code val restService = new RestService(schemaRegistryUrl) val props = Map( "basic.auth.credentials.source" -> "USER_INFO", "basic.auth.user.info" -> "%s:%s".format(key, secret) ).asJava …
alonisser
  • 11,542
  • 21
  • 85
  • 139
1
vote
0 answers

Databricks fails to create StructType / Schema when used against case class installed as Jar file

I am using ScalaReflection to create schema from Case Classes. I have installed the jar containing the case classes in the databricks cluster and When I invoke the following…
Ganesh
  • 1,654
  • 2
  • 19
  • 32
1
vote
1 answer

Databricks: Azure Queue Storage structured streaming key not found error

I am trying to write ETL pipeline for AQS streaming data. Here is my code CONN_STR = dbutils.secrets.get(scope="kvscope", key = "AZURE-STORAGE-CONN-STR") schema = StructType([ StructField("id", IntegerType()), StructField("parkingId",…
1
vote
3 answers

Installing Janitor library in Azure Databricks

I have Python 3.7 installed. Trying to install janitor library in Azure DataBricks. It works properly in my local machine, but have difficulty to be installed in Azure DataBricks. I run dbutils.library.installPyPI('janitor'), but got the below…
Amn Kh
  • 531
  • 3
  • 7
  • 19