Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about apache-spark or public Spark packages maintained by Databricks (like spark-csv).

Related tags:

7135 questions

votes

4 answers

databricks: check if the mountpoint already mounted

How to check if the mount point is already mounted before mount in databricks python ?? dbutils.fs.mount Thanks

python azure databricks azure-databricks

asked Oct 23 '19 at 02:24

mytabi

votes

2 answers

Saving Matplotlib Output to DBFS on Databricks

I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS. Code: import pandas as pd import matplotlib.pyplot…

matplotlib databricks

asked Jul 25 '19 at 14:00

KikiNeko

votes

1 answer

Exporting spark dataframe to .csv with header and specific filename

I am trying to export data from a spark dataframe to .csv file: df.coalesce(1)\ .write\ .format("com.databricks.spark.csv")\ .option("header", "true")\ .save(output_path) It is creating a file name…

python apache-spark pyspark export-to-csv databricks

asked Feb 06 '18 at 21:13

Naresh Y

votes

6 answers

Databricks display() function equivalent or alternative to Jupyter

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark dataframes and RDDs ,but there's no direct equivalent…

apache-spark jupyter-notebook databricks

asked Sep 08 '17 at 23:11

Luis Leal

3,388
5
26
49

votes

2 answers

Adding constant value column to spark dataframe

I am using Spark version 2.1 in Databricks. I have a data frame named wamp to which I want to add a column named region which should take the constant value NE. However, I get an error saying NameError: name 'lit' is not defined when I run the…

apache-spark pyspark databricks

asked May 17 '17 at 19:13

Gaurav Bansal

5,221
14
45
91

votes

1 answer

Spark dataframe save in single file on hdfs location

I have dataframe and i want to save in single file on hdfs location. i found the solution here Write single CSV file using spark-csv df.coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") …

csv apache-spark dataframe databricks

asked Nov 24 '16 at 18:01

shikha dubey

votes

1 answer

How can I convert a pyspark.sql.dataframe.DataFrame back to a sql table in databricks notebook

I created a dataframe of type pyspark.sql.dataframe.DataFrame by executing the following line: dataframe = sqlContext.sql("select * from my_data_table") How can I convert this back to a sparksql table that I can run sql queries on?

python sql apache-spark pyspark databricks

asked Aug 19 '16 at 23:03

Semihcan Doken

votes

2 answers

What are the major differences between S3 lake formation governed tables and databricks delta tables?

What are the major differences between S3 lake formation governed tables and databricks delta tables? they look pretty similar.

amazon-s3 databricks delta-lake aws-lake-formation

asked Dec 06 '21 at 12:01

MGomez

votes

1 answer

Switching between Databricks Connect and local Spark environment

I am looking to use Databricks Connect for developing a pyspark pipeline. DBConnect is really awesome because I am able to run my code on the cluster where the actual data resides, so it's perfect for integration testing, but I also want to be able…

apache-spark pyspark databricks databricks-connect

asked May 11 '20 at 16:39

casparjespersen

3,460
5
38
63

votes

6 answers

What is the correct way to install the delta module in python?

What is the correct way to install the delta module in python?? In the example they import the module from delta.tables import * but i did not find the correct way to install the module in my virtual env Currently i am using this spark param…

pyspark databricks delta-lake

asked Dec 17 '19 at 11:37

ofriman

votes

2 answers

Pass additional arguments to foreachBatch in pyspark

I am using foreachBatch in pyspark structured streaming to write each microbatch to SQL Server using JDBC. I need to use the same process for several tables, and I'd like to reuse the same writer function by adding an additional argument for table…

apache-spark pyspark spark-structured-streaming databricks

asked May 03 '19 at 16:12

nstudenski

votes

1 answer

Error running Spark on Databricks: constructor public XXX is not whitelisted

I was using Azure Databricks and trying to run some example python code from this page. But I get this exception: py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml.classification.LogisticRegression(java.lang.String) is not…

apache-spark pyspark databricks azure-databricks whitelist

asked Mar 30 '19 at 03:05

lidong

votes

2 answers

Difference in usecases for AWS Sagemaker vs Databricks?

I was looking at Databricks because it integrates with AWS services like Kinesis, but it looks to me like SageMaker is a direct competitor to Databricks? We are heavily using AWS, is there any reason to add DataBricks into the stack or odes…

apache-spark pyspark databricks amazon-sagemaker

asked Mar 13 '19 at 00:23

L Xandor

1,659
4
24
48

votes

2 answers

Unsupported literal type class scala.runtime.BoxedUnit

I am trying to filter a column of a dataframe read from oracle as below import org.apache.spark.sql.functions.{col, lit, when} val df0 = df_org.filter(col("fiscal_year").isNotNull()) When I do it I am getting below…

scala apache-spark-sql datastax databricks

asked Nov 19 '18 at 12:37

BdEngineer

2,929
4
49
85

votes

3 answers

Create a new cluster in Databricks using databricks-cli

I'm trying to create a new cluster in Databricks on Azure using databricks-cli. I'm using the following command: databricks clusters create --json '{ "cluster_name": "template2", "spark_version": "4.1.x-scala2.11" }' And getting back this…

bash azure command-line-interface databricks databricks-cli

asked Jun 06 '18 at 13:13

Mor Shemesh

2,689
1
24
36

Prev 1 2 3

…

99 100 Next