Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about or public Spark packages maintained by Databricks (like ).

Related tags:

7135 questions
19
votes
7 answers

How to slice a pyspark dataframe in two row-wise

I am working in Databricks. I have a dataframe which contains 500 rows, I would like to create two dataframes on containing 100 rows and the other containing the remaining 400 rows. +--------------------+----------+ | userid|…
Data_101
  • 893
  • 7
  • 14
  • 25
18
votes
2 answers

lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU

I am running the following code for LSTM on Databricks with GPU model = Sequential() model.add(LSTM(64, activation=LeakyReLU(alpha=0.05), batch_input_shape=(1, timesteps, n_features), stateful=False, return_sequences =…
18
votes
1 answer

Local instance of Databricks for development

I am currently working on a small team that is developing a Databricks based solution. For now we are small enough to work off of cloud instances of Databricks. As the group grows this will not really be practical. Is there a "local" install of…
John
  • 3,458
  • 4
  • 33
  • 54
17
votes
1 answer

What does "Determining location of DBIO file fragments..." mean, and how do I speed it up?

When running simple SQL commands in Databricks, sometimes I get the message: Determining location of DBIO file fragments. This operation can take some time. What does this mean, and how do I prevent it from having to perform this…
David Maddox
  • 1,884
  • 3
  • 21
  • 32
17
votes
3 answers

Databricks - How to change a partition of an existing Delta table?

I have a table in Databricks delta which is partitioned by transaction_date. I want to change the partition column to view_date. I tried to drop the table and then create it with a new partition column using PARTITIONED BY (view_date). However my…
samba
  • 2,821
  • 6
  • 30
  • 85
17
votes
3 answers

How to export data from a dataframe to a file databricks

I'm doing right now Introduction to Spark course at EdX. Is there a possibility to save dataframes from Databricks on my computer. I'm asking this question, because this course provides Databricks notebooks which probably won't work after the…
Tom Becker
  • 191
  • 1
  • 1
  • 5
16
votes
1 answer

Spark: Read an inputStream instead of File

I'm using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing. The data I am processing comes from different sources (Remote URL, local file, Google Cloud Storage), and I'm in the habit of turning…
Nate Vaughan
  • 3,471
  • 4
  • 29
  • 47
15
votes
5 answers

Databricks: Issue while creating spark data frame from pandas

I have a pandas data frame which I want to convert into spark data frame. Usually, I use the below code to create spark data frame from pandas but all of sudden I started to get the below error, I am aware that pandas has removed iteritems() but my…
data en
  • 431
  • 1
  • 2
  • 9
15
votes
2 answers

How to set environment variable in databricks?

Simple question, but I can't find a simple guide on how to set the environment variable in Databricks. Also, is it important to set the environment variable on both the driver and executors (and would you do this via spark.conf)? Thanks
15
votes
8 answers

How to read xlsx or xls files as spark dataframe

Can anyone let me know without converting xlsx or xls files how can we read them as a spark dataframe I have already tried to read with pandas and then tried to convert to spark dataframe but got the error and the error is Error: Cannot merge type…
Ravi Kiran
  • 151
  • 1
  • 1
  • 6
15
votes
3 answers

How to solve this error org.apache.spark.sql.catalyst.errors.package$TreeNodeException

I have two procesess each process do 1) connect oracle db read a specific table 2) form dataframe and process it. 3) save the df to cassandra. If I am running both process parallelly , both try to read from oracle and I am getting below error…
15
votes
2 answers

Read/Write single file in DataBricks

I have a file which contains a list of names stored in a simple text file. Each row contains one name. Now I need to pro grammatically append a new name to this file based on a users input. For the input itself I use DataBricks widgets - this is…
Gerhard Brueckl
  • 708
  • 1
  • 9
  • 24
14
votes
1 answer

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the secret it shows [REDACTED]. print(dbutils.secrets.get(scope="myScope",…
14
votes
4 answers

list the files of a directory and subdirectory recursively in Databricks(DBFS)

Using python/dbutils, how to display the files of the current directory & subdirectory recursively in Databricks file system(DBFS).
Kiran A
  • 179
  • 1
  • 2
  • 7
14
votes
3 answers

How to find size (in MB) of dataframe in pyspark?

How to find size (in MB) of dataframe in pyspark , df=spark.read.json("/Filestore/tables/test.json") I want to find how the size of df or test.json
Aravindh
  • 141
  • 1
  • 1
  • 6