Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about apache-spark or public Spark packages maintained by Databricks (like spark-csv).

Related tags:

7135 questions

votes

1 answer

How to connect to Databricks Delta tables from QlikView?

I need to create QlikView Dashboard using data in databricks Delta Lakes. Has anyone tried connecting to DBFS using from a QlikView dashboard? I usually use JDBC connection string to connect to DBFS from my scala code. For that I use Spark Simba…

apache-spark databricks qlikview azure-databricks delta-lake

asked Sep 26 '19 at 11:34

Preeti Joshi

votes

3 answers

Error connecting to databricks in python with databricks-connect

I'm using databricks-connect on mac using pycharm but after I finished the configuration and tried to run databricks-connect test, I got the following error and have no idea what the problem is. I followed this documentation:…

scala apache-spark pyspark databricks databricks-connect

asked Sep 25 '19 at 19:16

efsee

votes

1 answer

Error java.lang.AssertionError: assertion failed when i display dataframe (created joining others dataframes)

I'm joining three data frames and all it's ok, but when I call to "display" method at the final data frame (joining three previous dataframe) databricks return this error: java.lang.AssertionError: assertion failed I'm using: %fs head…

dataframe join display databricks assertion

asked Sep 24 '19 at 08:29

Danny

votes

1 answer

Databricks : Equivalent code for SQL query

I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING…

sql apache-spark-sql databricks azure-databricks

asked Sep 18 '19 at 14:18

Yanni Pattas

votes

2 answers

How to change the Spark user running jobs in Azure Databricks?

I am using Spark on Azure Databricks 5.5. I submit Spark jobs through the Databricks workspace UI via Jobs, Notebooks, and Spark-submit. The jobs are being successfully submitted, and Databricks new clusters are being spawned or existing ones are…

azure apache-spark pyspark databricks azure-databricks

asked Sep 17 '19 at 03:00

FRG96

votes

1 answer

How to use the result of a BashOperator task as argument of another Airflow task?

I need to pass a job_id parameter to my object DatabricksRunNowOperator(). The job_id is the result of executing the databricks jobs create --json '{myjson} command. $ databricks jobs create --json '{myjson}' {job_id: 12} import os import…

airflow databricks

asked Sep 16 '19 at 15:48

Eric Bellet

1,732
5
22
40

votes

2 answers

How to deploy Databricks cluster with specified permissions?

I am deploying some Databricks clusters using powershell script which takes as an input json file with pre-defined cluster templates, for example: { "cluster_name": "test1", "max_retries": 1, "spark_version": "5.3.x-scala2.11", …

azure databricks azure-databricks

asked Sep 16 '19 at 11:40

Grevioos

votes

1 answer

databricks configure using cmd and R

I am trying to use databricks cli and invoke the databricks configure That's how I do it from cmd somepath>databricks configure --token Databricks Host (should begin with https://): my_https_address Token: my_token I want to invoke the same…

r command-line databricks azure-databricks

asked Sep 10 '19 at 08:48

89_Simple

3,393
3
39
94

votes

2 answers

Spark XML Tags are missing when null values are coming

Below is the dataframe I have. +-------+----+----------+ | city|year|saleAmount| +-------+----+----------+ |Toronto|2017| 50.0| |Toronto|null| 50.0| |Sanjose|2017| 200.0| |Sanjose|null| 200.0| | Plano|2015| 50.0| | …

xml scala apache-spark databricks

asked Sep 05 '19 at 18:07

user3190018

votes

1 answer

Submitting jobs with different parameters using command line databricks

I have jar and an associated properties file. In order to run the jar, this is what I do on Databricks on Azure: I click on: +Create Job Task: com.xxx.sparkmex.core.ModelExecution in my.jar - Edit / Upload JAR / Remove Parameters:…

bash azure command-line databricks

asked Sep 05 '19 at 11:11

89_Simple

3,393
3
39
94

votes

1 answer

Can't use "update" in outputMode() when writing stream data in spark

I'm trying to write stream data in spark to delta format, but it looks like it won't allow me to use update in outputMode(), below is my code and error message: deltaStreamingQuery = (eventsDF .writeStream .format("delta") …

apache-spark pyspark databricks delta-lake

asked Aug 30 '19 at 00:24

efsee

votes

1 answer

spark read blob storage using wildcard

I want to read Azure Blob storage files into spark using databricks. But I do not want to set a specific file or * for each level of nesting. The standard: is **/*/ not working. These work just fine: val df =…

apache-spark azure-blob-storage databricks azure-databricks

asked Aug 24 '19 at 10:56

Georg Heiler

16,916
36
162
292

votes

2 answers

Group by value within range in Azure Databricks

Consider following data: EventDate,Value 1.1.2019,11 1.2.2019,5 1.3.2019,6 1.4.2019,-15 1.5.2019,-20 1.6.2019,-30 1.7.2019,12 1.8.2019,20 I want to create groups of when these values are within thresholds: 1. > 10 2. <=10 >=-10 3. >-10 The…

python azure databricks azure-databricks

asked Aug 20 '19 at 19:07

ruffen

1,695
2
25
51

votes

1 answer

Which of my Databricks notebook uses the cluster nodes?

I run several notebooks on Azure Databricks Spark cluster at the same time. How can I see the cluster nodes usage rate of each notebook \ app over a period of time? Both the "Spark Cluster UI - Master" and "Spark UI" tabs didn't provide such…

apache-spark databricks azure-databricks

asked Aug 11 '19 at 14:40

David Taub

votes

0 answers

Default schema value conversion fails in to_avro() while publishing data to Kafka using databricks spark-avro

Trying to publish data into Kafka topic using confluent schema registry. Following is my schema registry schemaRegistryClient.register("primitive_type_str_avsc", new Schema.Parser().parse( s""" |{ | "type": "record", | "name":…

databricks confluent-schema-registry spark-avro

asked Aug 08 '19 at 11:42

cristen

Prev 1 2 3

…

99 100 Next