Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about apache-spark or public Spark packages maintained by Databricks (like spark-csv).

Related tags:

7135 questions

votes

7 answers

How to slice a pyspark dataframe in two row-wise

I am working in Databricks. I have a dataframe which contains 500 rows, I would like to create two dataframes on containing 100 rows and the other containing the remaining 400 rows. +--------------------+----------+ | userid|…

python pyspark apache-spark-sql databricks

asked Feb 20 '18 at 12:06

Data_101

votes

2 answers

lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU

I am running the following code for LSTM on Databricks with GPU model = Sequential() model.add(LSTM(64, activation=LeakyReLU(alpha=0.05), batch_input_shape=(1, timesteps, n_features), stateful=False, return_sequences =…

python tensorflow lstm databricks

asked Aug 19 '21 at 08:54

Muhammad Haris Choudhary

votes

1 answer

Local instance of Databricks for development

I am currently working on a small team that is developing a Databricks based solution. For now we are small enough to work off of cloud instances of Databricks. As the group grows this will not really be practical. Is there a "local" install of…

databricks azure-databricks aws-databricks

asked Sep 11 '20 at 03:17

John

3,458
4
33
54

votes

1 answer

What does "Determining location of DBIO file fragments..." mean, and how do I speed it up?

When running simple SQL commands in Databricks, sometimes I get the message: Determining location of DBIO file fragments. This operation can take some time. What does this mean, and how do I prevent it from having to perform this…

apache-spark-sql databricks

asked Nov 30 '19 at 20:11

David Maddox

1,884
3
21
32

votes

3 answers

Databricks - How to change a partition of an existing Delta table?

I have a table in Databricks delta which is partitioned by transaction_date. I want to change the partition column to view_date. I tried to drop the table and then create it with a new partition column using PARTITIONED BY (view_date). However my…

databricks delta-lake

asked Mar 04 '19 at 18:12

samba

2,821
6
30
85

votes

3 answers

How to export data from a dataframe to a file databricks

I'm doing right now Introduction to Spark course at EdX. Is there a possibility to save dataframes from Databricks on my computer. I'm asking this question, because this course provides Databricks notebooks which probably won't work after the…

apache-spark pyspark databricks

asked Jul 27 '16 at 17:55

Tom Becker

votes

1 answer

Spark: Read an inputStream instead of File

I'm using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing. The data I am processing comes from different sources (Remote URL, local file, Google Cloud Storage), and I'm in the habit of turning…

java apache-spark apache-spark-sql databricks

asked Jul 20 '16 at 21:13

Nate Vaughan

3,471
4
29
47

votes

5 answers

Databricks: Issue while creating spark data frame from pandas

I have a pandas data frame which I want to convert into spark data frame. Usually, I use the below code to create spark data frame from pandas but all of sudden I started to get the below error, I am aware that pandas has removed iteritems() but my…

python pandas apache-spark databricks iteritems

asked Apr 04 '23 at 07:32

data en

votes

2 answers

How to set environment variable in databricks?

Simple question, but I can't find a simple guide on how to set the environment variable in Databricks. Also, is it important to set the environment variable on both the driver and executors (and would you do this via spark.conf)? Thanks

apache-spark environment-variables databricks

asked Jul 02 '19 at 15:44

information_interchange

2,538
6
31
49

votes

8 answers

How to read xlsx or xls files as spark dataframe

Can anyone let me know without converting xlsx or xls files how can we read them as a spark dataframe I have already tried to read with pandas and then tried to convert to spark dataframe but got the error and the error is Error: Cannot merge type…

python-3.x azure databricks

asked Jun 03 '19 at 11:05

Ravi Kiran

votes

3 answers

How to solve this error org.apache.spark.sql.catalyst.errors.package$TreeNodeException

I have two procesess each process do 1) connect oracle db read a specific table 2) form dataframe and process it. 3) save the df to cassandra. If I am running both process parallelly , both try to read from oracle and I am getting below error…

apache-spark cassandra databricks datastax-enterprise cassandra-3.0

asked Oct 26 '18 at 14:48

BdEngineer

2,929
4
49
85

votes

2 answers

Read/Write single file in DataBricks

I have a file which contains a list of names stored in a simple text file. Each row contains one name. Now I need to pro grammatically append a new name to this file based on a users input. For the input itself I use DataBricks widgets - this is…

python pyspark databricks

asked Mar 16 '18 at 10:25

Gerhard Brueckl

votes

1 answer

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the secret it shows [REDACTED]. print(dbutils.secrets.get(scope="myScope",…

amazon-web-services apache-spark pyspark databricks azure-databricks

asked Nov 11 '21 at 08:49

aykcandem

votes

4 answers

list the files of a directory and subdirectory recursively in Databricks(DBFS)

Using python/dbutils, how to display the files of the current directory & subdirectory recursively in Databricks file system(DBFS).

python-3.x azure databricks azure-databricks

asked Sep 18 '20 at 12:29

Kiran A

votes

3 answers

How to find size (in MB) of dataframe in pyspark?

How to find size (in MB) of dataframe in pyspark , df=spark.read.json("/Filestore/tables/test.json") I want to find how the size of df or test.json

scala dataframe apache-spark pyspark databricks

asked Jun 16 '20 at 15:15

Aravindh

Prev 1

…

99 100 Next