Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
1
vote
0 answers

Is there a SparkR function equivalent to unique and orderBy from R, that brings all the columns?

I'm one month old in the Data world and my goal is to refactor existing local R scripts to work with SparkR on Databricks. This is the R code: minmaxAcctDates <- intLoadFiles("Accounts_BalanceEOD", monthID) minmaxAcctDates$CUSTOMER_NUMBER <-…
1
vote
1 answer

Azure Databricks notebook not deployed via YAML deploynotebooks@0 task

This is my YAML. The notebooks are not deployed. the pipeline ran successfully but the notebooks are not deployed. What am I doing wrong ?
Blue Clouds
  • 7,295
  • 4
  • 71
  • 112
1
vote
3 answers

Update existing records of parquet file in Azure

I am converting my table into parquet file format using Azure Data Factory. Performing query on parquet file using databricks for reporting. I want to update only existing records which are updated in original sql server table. Since I am performing…
1
vote
1 answer

Importing ipynb file from another ipynb notebook in azure databricks

I am trying to import ipynb notebook from another notebook in Azure Databricks using from ipynb.fs.full.test_1 import * While importing I am getting the following key error KeyError: 'package' Here is my test code class Test1: def t1(): …
1
vote
2 answers

Records not showing until Azure Databricks cluster restarted

We have been using Azure Databricks / Delta lake for the last couple of months and recently have started to spot some strange behaviours with loaded records, in particular latest records not being returned unless the cluster is restarted or a…
1
vote
1 answer

Databricks notebook %run relative path, not working for 3 level deep

I need to run a databricks notebook 3 folder levels up with relative path but it is not working. Is it a limitation? It works if I specify full path.. This is what I have tested: %run ./folder/notebook - WORKS %run ../folder/notebook - WORKS %run…
baatchen
  • 469
  • 1
  • 7
  • 16
1
vote
1 answer

ARM template for Azure Data Bricks Diagnostic settings

I am able to configure diagnostic settings for azure data bricks in the portal,I need a ARM template to automate the creation of diagnostic settings for azure data bricks. let me know if any additional information required from my side. Thanks in…
1
vote
1 answer

How to increase Databricks performance?

I have a problem here that I write to synapse running taking so much time (> 20 hours). What can I do to improving my Databricks that need to write to synapse? My resource table is from Fact Table (Contains 151 millions of row) on Azure Synase. I…
1
vote
1 answer

Setting up PostgreSQL driver on Azure Databricks

How can I modify the code below to install a PostgreSQL JDBC driver instead of MS SQL? My goal is to use pyodbc to connect to a Redshift database from Azure Databricks. I thought that the PostgreSQL JDBC driver was already installed in my Databricks…
1
vote
1 answer

How to move an excel file with a dynamic name (with actual date)?

I tried to save an excel file in azure databricks with a dynamic name: import pandas as pd #initialize the excel writer writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter') #store your dataframes in a dict, where the…
jos97
  • 405
  • 6
  • 18
1
vote
1 answer

How to get geospatial POINT using SparkSQL

I'm converting a process from postgreSQL over to DataBrick ApacheSpark, The postgresql process uses the following sql function to get the point on a map from a X and Y value. ST_Transform(ST_SetSrid(ST_MakePoint(x, y),4326),3857) Does anyone know…
1
vote
3 answers

How to pass a dataframe as notebook parameter in databricks?

I have a requirement wherein I need to pass a pyspark dataframe as notebook parameter to a child notebook. Essentially, the child notebook has few functions with argument type as dataframe to perform certain tasks. Now the problem is I'm unable to…
user16714516
1
vote
1 answer

Is there a shorthand for selecting only one dataframe's columns after a join?

I'm working in scala with a dataframe, but the dataframe has ~60 columns. In a Databricks pipeline, we've split a few columns out along with an identity column to validate some data, resulting in a 'reference' dataframe. I'd like to join it back to…
Blue
  • 163
  • 1
  • 12
1
vote
0 answers

Azure Devops and Azure Databricks authentication tokens

Recently I've been developing a python package install_databricks_packages which contacts the Databricks APIs (using requests, not the CLI) in order to install packages on Databricks Clusters. This package is used in release pipelines, where one can…
luigi
  • 159
  • 1
  • 10
1
vote
1 answer

How to get total number of clusters, jobs, libraries installed etc. in existing azure databricks workspace?

Is there a programmatic way to get total number of clusters, jobs, spark & scala runtime versions, libraries installed , other artifacts etc. in existing azure databricks workspace?
Learn2Code
  • 107
  • 1
  • 11
1 2 3
99
100