Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about or public Spark packages maintained by Databricks (like ).

Related tags:

7135 questions
14
votes
3 answers

Ways to Plot Spark Dataframe without Converting it to Pandas

Is there any way to plot information from Spark dataframe without converting the dataframe to pandas? Did some online research but can't seem to find a way. I need to automatically save these plots as .pdf, so using the built-in visualization tool…
KikiNeko
  • 261
  • 1
  • 3
  • 7
14
votes
2 answers

reading data from URL using spark databricks platform

trying to read data from url using spark on databricks community edition platform i tried to use spark.read.csv and using SparkFiles but still, i am missing some simple point url =…
arya
  • 436
  • 1
  • 5
  • 18
14
votes
1 answer

Use of lit() in expr()

The line: df.withColumn("test", expr("concat(lon, lat)")) works as expected but df.withColumn("test", expr("concat(lon, lit(','), lat)")) produces the following exception: org.apache.spark.sql.AnalysisException: Undefined function: 'lit'. This…
Kyunam
  • 169
  • 1
  • 1
  • 5
14
votes
3 answers

How to login SSH on Azure Databricks cluster

I used the following ubuntu command to access SSH login as, ssh user@hostname_or_IP Can able to see Master node hostname but not able to get the username from Azure Databricks cluster Refer this…
dwayneJohn
  • 919
  • 1
  • 12
  • 30
13
votes
3 answers

Databricks - is not empty but it's not a Delta table

I run a query on Databricks: DROP TABLE IF EXISTS dublicates_hotels; CREATE TABLE IF NOT EXISTS dublicates_hotels ... I'm trying to understand why I receive the following error: Error in SQL statement: AnalysisException: Cannot create table…
QbS
  • 425
  • 1
  • 4
  • 17
13
votes
1 answer

Databricks Community Edition Cluster won't start

I am trying to start a cluster that was terminated in a Community Edition. However, whenever I click on 'start' the cluster won't start. It would appear I have to create a new cluster everytime I want to work with Databrick clusters. Can someone…
Patterson
  • 1,927
  • 1
  • 19
  • 56
13
votes
1 answer

ArrowTypeError: Did not pass numpy.dtype object', 'Conversion failed for column X with type int32

Problem I am trying to save a data frame as a parquet file on Databricks, getting the ArrowTypeError. Databricks Runtime Version: 7.6 ML (includes Apache Spark 3.0.1, Scala 2.12) Log Trace ArrowTypeError: ('Did not pass numpy.dtype object',…
Naga Budigam
  • 689
  • 1
  • 10
  • 26
13
votes
0 answers

PySpark and Protobuf Deserialization UDF Problem

I'm getting this error Can't pickle : it's not found as google.protobuf.pyext._message.CMessage when I try to create a UDF in PySpark. Apparently, it uses CloudPickle to serialize the command…
Marc Vitalis
  • 2,129
  • 4
  • 24
  • 36
13
votes
3 answers

How to rename a column in Databricks

How do you rename a column in Databricks? The following does not work: ALTER TABLE mySchema.myTable change COLUMN old_name new_name int It returns the error: ALTER TABLE CHANGE COLUMN is not supported for changing column 'old_name' with type…
David Maddox
  • 1,884
  • 3
  • 21
  • 32
13
votes
2 answers

How to write pandas dataframe into Databricks dbfs/FileStore?

I'm new to the Databricks, need help in writing a pandas dataframe into databricks local file system. I did search in google but could not find any case similar to this, also tried the help guid provided by databricks (attached) but that did not…
Shaan Proms
  • 133
  • 1
  • 1
  • 5
13
votes
2 answers

How to properly access dbutils in Scala when using Databricks Connect

I'm using Databricks Connect to run code in my Azure Databricks cluster locally from IntelliJ IDEA (Scala). Everything works fine. I can connect, debug, inspect locally in the IDE. I created a Databricks Job to run my custom app JAR, but it fails…
empz
  • 11,509
  • 16
  • 65
  • 106
13
votes
5 answers

Delta Lake rollback

Need an elegant way to rollback Delta Lake to a previous version. My current approach is listed below: import io.delta.tables._ val deltaTable = DeltaTable.forPath(spark, testFolder) spark.read.format("delta") .option("versionAsOf", 0) …
Fang Zhang
  • 1,597
  • 18
  • 18
12
votes
1 answer

Databricks - Download a dbfs:/FileStore file to my Local Machine

Normally I use below URL to download file from Databricks DBFS FileStore to my local computer. *https:///fileStore/?o=* However, this time the file is not downloaded and the URL lead me to…
PJT
  • 185
  • 1
  • 1
  • 9
12
votes
2 answers

Saving an Matlabplot as an MLFlow artifact

I am using DataBricks and Spark 7.4ML, The following code successfully logs the params and metrics, and I can see the ROCcurve.png in the MLFLOW gui (just the item in the tree below the model). But the actually plot is blank. Why? with…
Dr.YSG
  • 7,171
  • 22
  • 81
  • 139
12
votes
1 answer

How to create a empty folder in Azure Blob from Azure databricks

I have scenario where I want to list all the folders inside a directory in Azure Blob. If no folders present create a new folder with certain name. I am trying to list the folders using dbutils.fs.ls(path). But the problem with the above command is…
Saikat
  • 403
  • 1
  • 7
  • 19