Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions

vote

0 answers

Is there a SparkR function equivalent to unique and orderBy from R, that brings all the columns?

I'm one month old in the Data world and my goal is to refactor existing local R scripts to work with SparkR on Databricks. This is the R code: minmaxAcctDates <- intLoadFiles("Accounts_BalanceEOD", monthID) minmaxAcctDates$CUSTOMER_NUMBER <-…

asked Sep 15 '21 at 06:00

Jeyveen Bhoyroo

vote

1 answer

Azure Databricks notebook not deployed via YAML deploynotebooks@0 task

This is my YAML. The notebooks are not deployed. the pipeline ran successfully but the notebooks are not deployed. What am I doing wrong ?

azure azure-devops azure-databricks

asked Sep 14 '21 at 13:14

Blue Clouds

7,295
4
71
112

vote

3 answers

Update existing records of parquet file in Azure

I am converting my table into parquet file format using Azure Data Factory. Performing query on parquet file using databricks for reporting. I want to update only existing records which are updated in original sql server table. Since I am performing…

azure apache-spark-sql databricks parquet azure-databricks

asked Sep 13 '21 at 09:11

Aniket Kumar

vote

1 answer

Importing ipynb file from another ipynb notebook in azure databricks

I am trying to import ipynb notebook from another notebook in Azure Databricks using from ipynb.fs.full.test_1 import * While importing I am getting the following key error KeyError: 'package' Here is my test code class Test1: def t1(): …

python databricks azure-databricks

asked Sep 07 '21 at 06:53

Nandhakumar Rajendran

vote

2 answers

Records not showing until Azure Databricks cluster restarted

We have been using Azure Databricks / Delta lake for the last couple of months and recently have started to spot some strange behaviours with loaded records, in particular latest records not being returned unless the cluster is restarted or a…

databricks azure-databricks delta-lake

asked Sep 06 '21 at 14:39

Colin Olliver

vote

1 answer

Databricks notebook %run relative path, not working for 3 level deep

I need to run a databricks notebook 3 folder levels up with relative path but it is not working. Is it a limitation? It works if I specify full path.. This is what I have tested: %run ./folder/notebook - WORKS %run ../folder/notebook - WORKS %run…

databricks azure-databricks

asked Sep 06 '21 at 12:42

baatchen

vote

1 answer

ARM template for Azure Data Bricks Diagnostic settings

I am able to configure diagnostic settings for azure data bricks in the portal,I need a ARM template to automate the creation of diagnostic settings for azure data bricks. let me know if any additional information required from my side. Thanks in…

azure-databricks azure-rm-template azure-diagnostics

asked Sep 06 '21 at 04:16

Mounika

vote

1 answer

How to increase Databricks performance?

I have a problem here that I write to synapse running taking so much time (> 20 hours). What can I do to improving my Databricks that need to write to synapse? My resource table is from Fact Table (Contains 151 millions of row) on Azure Synase. I…

apache-spark pyspark apache-spark-sql databricks azure-databricks

asked Sep 06 '21 at 02:23

MADFROST

1,043
2
11
29

vote

1 answer

Setting up PostgreSQL driver on Azure Databricks

How can I modify the code below to install a PostgreSQL JDBC driver instead of MS SQL? My goal is to use pyodbc to connect to a Redshift database from Azure Databricks. I thought that the PostgreSQL JDBC driver was already installed in my Databricks…

python postgresql amazon-redshift databricks azure-databricks

asked Sep 04 '21 at 19:33

Nate

vote

1 answer

How to move an excel file with a dynamic name (with actual date)?

I tried to save an excel file in azure databricks with a dynamic name: import pandas as pd #initialize the excel writer writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter') #store your dataframes in a dict, where the…

python pandas shell databricks azure-databricks

asked Sep 02 '21 at 16:09

jos97

vote

1 answer

How to get geospatial POINT using SparkSQL

I'm converting a process from postgreSQL over to DataBrick ApacheSpark, The postgresql process uses the following sql function to get the point on a map from a X and Y value. ST_Transform(ST_SetSrid(ST_MakePoint(x, y),4326),3857) Does anyone know…

postgresql apache-spark apache-spark-sql databricks azure-databricks

asked Aug 31 '21 at 15:05

user2793343

vote

3 answers

How to pass a dataframe as notebook parameter in databricks?

I have a requirement wherein I need to pass a pyspark dataframe as notebook parameter to a child notebook. Essentially, the child notebook has few functions with argument type as dataframe to perform certain tasks. Now the problem is I'm unable to…

pyspark databricks azure-databricks spark-notebook azure-notebooks

asked Aug 30 '21 at 14:29

user16714516

vote

1 answer

Is there a shorthand for selecting only one dataframe's columns after a join?

I'm working in scala with a dataframe, but the dataframe has ~60 columns. In a Databricks pipeline, we've split a few columns out along with an identity column to validate some data, resulting in a 'reference' dataframe. I'd like to join it back to…

scala dataframe azure-databricks

asked Aug 27 '21 at 16:24

Blue

vote

0 answers

Azure Devops and Azure Databricks authentication tokens

Recently I've been developing a python package install_databricks_packages which contacts the Databricks APIs (using requests, not the CLI) in order to install packages on Databricks Clusters. This package is used in release pipelines, where one can…

azure-devops azure-databricks azure-authentication

asked Aug 26 '21 at 10:33

luigi

vote

1 answer

How to get total number of clusters, jobs, libraries installed etc. in existing azure databricks workspace?

Is there a programmatic way to get total number of clusters, jobs, spark & scala runtime versions, libraries installed , other artifacts etc. in existing azure databricks workspace?

azure azure-powershell azure-databricks

asked Aug 26 '21 at 08:38

Learn2Code

Prev 1 2 3

…

100 Next