Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
1
vote
1 answer

How to check if query is pushed down from databricks to snowflake?

I'm trying to use query pushdown from databricks to Snowflake. I'm reading data from snowflake(data source) to databricks, creating dataframes and applying joins, filter and aggregate functions. Code is running fine but not able to find if the query…
1
vote
1 answer

Query on SAP table from Azure Databricks

I want to query the SAP table from databricks. I have installed the JDBC library for connecting to the SAP server. I am able to connect and fetch records using spark.read.JDBC(url = jdbcUrl, table = query, properties = connectionProperties). In the…
Aswad
  • 29
  • 2
  • 5
1
vote
1 answer

PEM Certificate in Data Factory

Can't find anything on this. I am hitting an API for On the market, which is a UK real-estate website. As part of the auth, it requires us to submit a certificate & key. Docs…
TJB
  • 787
  • 1
  • 8
  • 29
1
vote
2 answers

Terraform Azure Databricks Provider Error

I need some assistance to undertand the various forms of logging in to Databricks. I am using Terraform to provision Azure Databricks I would like to know the difference in the two codes below When i use option 1, i get the error as shown Option 1: …
1
vote
3 answers

Scheduling job every other day in Azure Databricks

I need to schedule a job which will run every other day(if start is Mon then Wed, Fri, Sunday...). But in databricks job scheduler options are only for day, week, month and yearly basis.
Saswat Ray
  • 141
  • 3
  • 14
1
vote
0 answers

Terraform throwing error while trying to create Azure Databricks Cluster in a Landing Zone setup

I am trying to use Terraform to deploy Azure Databricks workspace and cluster. The workspace got created successfully along with the user group and i am able to login to Databricks successfully. The problem i am having is while creating the cluster.…
1
vote
1 answer

Import library not found in databricks notebook

using Azure Devops pipeline task, I'm importing azure.databricks.cicd.tools library and installing azure-identity and azure-keyvault-secrets. These libraries are installed fine on to a cluster when I add it to a cluster using a bearer token and…
1
vote
0 answers

How do I best manage SQL Server insertions from Databricks spark session and identity columns?

I have a batch-processing transaction data transformation/validation pipeline written in a Scala Databricks notebook, and when the pipeline is finished, it dumps my validated data into a SQL server for later use. Ongoing requirements are beginning…
Blue
  • 163
  • 1
  • 12
1
vote
1 answer

Databricks + ADF + ADLS2 + Hive = Azure Synapse

I have no experience with Azure Synapse but my understanding is that is the same as Databricks, ADF, ADLS2 and Hive in SQL DWH, all together in one workspace with a different name. Am I wrong?
1
vote
1 answer

Error reading Cassandra TTL and WRITETIME with Spark 3.0

Although the latest spark-cassandra-connector from DataStax states it supports reading/writing TTL and WRITETIME I am still receiving a SQL undefined function error. Using Databricks with library…
1
vote
1 answer

Spark Job stuck writing dataframe to partitioned Delta table

Running databricks to read csv files and then saving as a partitioned delta table. Total records in file are 179619219 . It is being split on COL A (8419 unique values) and Year ( 10 Years) and…
1
vote
1 answer

How to alter the column datatype of the managed delta table using pyspark?

How to alter the column datatype based on the input parameter using pyspark from pyspark.sql.types import IntegerType,BooleanType,DateType from pyspark.sql.functions import col Column_Name="EFFECTIVE_DATE" df=spark.sql(f"select * from…
1
vote
1 answer

read gen2 secondary account with databricks

I'm trying to read the secondary gen2 account in DataBricks, but I get the following error: java.io.FileNotFoundException: Operation failed: "The specified resource does not exist.", 404 follow the way I'm making the…
1
vote
2 answers

List all notebooks, jobs in databricks and load resultset into a dataframe and a managed table

Is there a method to list all notebooks, jobs in one workspace in databricks and load those into a managed table within DBFS? I found a function code in below link https://kb.databricks.com/python/list-all-workspace-objects.html However, this does…
Shruti
  • 105
  • 1
  • 10
1
vote
1 answer

Using Azure Data Factory transform multiple Excel data to a main file

I have two excel files in my Azure Database Container and I would like to transform that data and populate a single database or file in Azure Data Factory. For Example: I would like to copy the data from the below excel file: Product Units…
betbroke
  • 21
  • 5