Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about or public Spark packages maintained by Databricks (like ).

Related tags:

7135 questions
2
votes
1 answer

How to connect to Databricks Delta tables from QlikView?

I need to create QlikView Dashboard using data in databricks Delta Lakes. Has anyone tried connecting to DBFS using from a QlikView dashboard? I usually use JDBC connection string to connect to DBFS from my scala code. For that I use Spark Simba…
2
votes
3 answers

Error connecting to databricks in python with databricks-connect

I'm using databricks-connect on mac using pycharm but after I finished the configuration and tried to run databricks-connect test, I got the following error and have no idea what the problem is. I followed this documentation:…
efsee
  • 579
  • 1
  • 10
  • 22
2
votes
1 answer

Error java.lang.AssertionError: assertion failed when i display dataframe (created joining others dataframes)

I'm joining three data frames and all it's ok, but when I call to "display" method at the final data frame (joining three previous dataframe) databricks return this error: java.lang.AssertionError: assertion failed I'm using: %fs head…
Danny
  • 41
  • 5
2
votes
1 answer

Databricks : Equivalent code for SQL query

I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING…
2
votes
2 answers

How to change the Spark user running jobs in Azure Databricks?

I am using Spark on Azure Databricks 5.5. I submit Spark jobs through the Databricks workspace UI via Jobs, Notebooks, and Spark-submit. The jobs are being successfully submitted, and Databricks new clusters are being spawned or existing ones are…
FRG96
  • 151
  • 1
  • 9
2
votes
1 answer

How to use the result of a BashOperator task as argument of another Airflow task?

I need to pass a job_id parameter to my object DatabricksRunNowOperator(). The job_id is the result of executing the databricks jobs create --json '{myjson} command. $ databricks jobs create --json '{myjson}' {job_id: 12} import os import…
Eric Bellet
  • 1,732
  • 5
  • 22
  • 40
2
votes
2 answers

How to deploy Databricks cluster with specified permissions?

I am deploying some Databricks clusters using powershell script which takes as an input json file with pre-defined cluster templates, for example: { "cluster_name": "test1", "max_retries": 1, "spark_version": "5.3.x-scala2.11", …
Grevioos
  • 355
  • 5
  • 30
2
votes
1 answer

databricks configure using cmd and R

I am trying to use databricks cli and invoke the databricks configure That's how I do it from cmd somepath>databricks configure --token Databricks Host (should begin with https://): my_https_address Token: my_token I want to invoke the same…
89_Simple
  • 3,393
  • 3
  • 39
  • 94
2
votes
2 answers

Spark XML Tags are missing when null values are coming

Below is the dataframe I have. +-------+----+----------+ | city|year|saleAmount| +-------+----+----------+ |Toronto|2017| 50.0| |Toronto|null| 50.0| |Sanjose|2017| 200.0| |Sanjose|null| 200.0| | Plano|2015| 50.0| | …
user3190018
  • 890
  • 13
  • 26
2
votes
1 answer

Submitting jobs with different parameters using command line databricks

I have jar and an associated properties file. In order to run the jar, this is what I do on Databricks on Azure: I click on: +Create Job Task: com.xxx.sparkmex.core.ModelExecution in my.jar - Edit / Upload JAR / Remove Parameters:…
89_Simple
  • 3,393
  • 3
  • 39
  • 94
2
votes
1 answer

Can't use "update" in outputMode() when writing stream data in spark

I'm trying to write stream data in spark to delta format, but it looks like it won't allow me to use update in outputMode(), below is my code and error message: deltaStreamingQuery = (eventsDF .writeStream .format("delta") …
efsee
  • 579
  • 1
  • 10
  • 22
2
votes
1 answer

spark read blob storage using wildcard

I want to read Azure Blob storage files into spark using databricks. But I do not want to set a specific file or * for each level of nesting. The standard: is **/*/ not working. These work just fine: val df =…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
2
votes
2 answers

Group by value within range in Azure Databricks

Consider following data: EventDate,Value 1.1.2019,11 1.2.2019,5 1.3.2019,6 1.4.2019,-15 1.5.2019,-20 1.6.2019,-30 1.7.2019,12 1.8.2019,20 I want to create groups of when these values are within thresholds: 1. > 10 2. <=10 >=-10 3. >-10 The…
ruffen
  • 1,695
  • 2
  • 25
  • 51
2
votes
1 answer

Which of my Databricks notebook uses the cluster nodes?

I run several notebooks on Azure Databricks Spark cluster at the same time. How can I see the cluster nodes usage rate of each notebook \ app over a period of time? Both the "Spark Cluster UI - Master" and "Spark UI" tabs didn't provide such…
David Taub
  • 734
  • 1
  • 7
  • 27
2
votes
0 answers

Default schema value conversion fails in to_avro() while publishing data to Kafka using databricks spark-avro

Trying to publish data into Kafka topic using confluent schema registry. Following is my schema registry schemaRegistryClient.register("primitive_type_str_avsc", new Schema.Parser().parse( s""" |{ | "type": "record", | "name":…