Questions tagged [databricks-community-edition]
85 questions
1
vote
1 answer
SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: dbc
I am using DataBricks Community Edition
Here is the code:
code
It seems tht Spark cannot read or process the .dbc file format. I have this error:
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find data source: dbc.…

MBC
- 15
- 2
1
vote
0 answers
How to prevent pyspark to read parquet file header record as just another row instead of reading it as header?
I have a parquet file with 11 columns. I tried executing below ways in pyspark to read the file. It still assigns header names like Prop_0, Prop_1, Prop_2 instead of reading the starting header as header…

moonchild
- 11
- 1
1
vote
1 answer
Set Workflow Job Concurrency Limit in Databricks
I need a job to be triggered every 5 minutes. However, if that job is already running, it must not be triggered again until that run is finished. Hence, I need to set the maximum run concurrency for that job to only one instance at a time.
What…

bda
- 372
- 1
- 7
- 22
1
vote
1 answer
Differnce between Spark cluster provided in DataBricks Community Edition and Master = local[8} mentioned in Spark?
I am using DataBricks Community Edition and the cluster on which my notebook is running is showing:
that it has a driver with 15 gb memory and 2 cores.
Whereas when I get the Spark config in my notebook , it shows ;
Why is it still showing…

Karan Dhar
- 47
- 4
1
vote
1 answer
Issue with multi-column In predicates are not supported in the DELETE condition
I am using spark2.4.5 with java8 in my spark job which writes data into an s3 path.
Due to multiple triggers of job accidentally, it created duplicate records.
I am trying to remove the duplicates from s3 path using databricks.
While i am trying to…

Shasu
- 458
- 5
- 22
1
vote
1 answer
Generated/Default value in Delta table
I'm trying to set default values to column in Delta Lake table, for example:
CREATE TABLE delta.dummy_7 (id INT, yes BOOLEAN, name STRING, sys_date DATE GENERATED ALWAYS AS CAST('2022-01-01' AS DATE), sys_time TIMESTAMP) USING DELTA;
Error in…

Luis Estrada
- 371
- 7
- 20
1
vote
1 answer
Get cluster metric (Ganglia charts) of all clusters via REST API in Databricks
The question is specific to databricks. Is there any API to get the ganglia chart showing cluster usage? Need to get all the Ganglia charts that are available in the Databricks cluster metrics section for all the clusters via REST API calls. We are…

Scorpio
- 511
- 4
- 14
1
vote
2 answers
Passing DataFrame from notebook to another with pyspark
i'am trying to call a DataFrame that i created in notebook1 to use it in my notebook2 in Databricks Community addition with pyspark and i tried this code dbutils.notebook.run("notebook1", 60, {"dfnumber2"})
but it shows this…

BENOTH7
- 27
- 5
1
vote
0 answers
How do I import a local >2GB JSON file into Databricks Community Edition?
When I try to do it through their UI, I receive an error saying that the file size is too large. Are there any other ways to do this other than through Databrick's UI?

James Manson
- 11
- 1
1
vote
0 answers
with open to read json not working on databricks
I was creating a function to write to MongoDB atlas, and I could not open the json file from the dbfs/FileStore. I did research on this but it seems like it is a community edition issue and none of the examples I found worked. I was wondering if…

reksapj
- 101
- 7
1
vote
1 answer
Preprocessing large data in databricks community edition
I have 16 GB dataset and want to use it in databricks. However, in community edition DBFS limit is 10 GB.
May you please assist me to preprocess the data to be able to move it from driver to DBFS.

Shihab Masri
- 21
- 2
1
vote
1 answer
Unable to access files uploaded to dbfs on Databricks community edition Runtime 9.1. Tried the dbutils.fs.cp workaround which also didn't work
I'm a beginner to Spark and just picked up the highly recommended 'Spark - the Definitive Edition' textbook. Running the code examples and came across the first example that needed me to upload the flight-data csv files provided with the book. I've…

LearneR
- 2,351
- 3
- 26
- 50
1
vote
1 answer
Can You Persist a Model in Databricks Community Edition?
Is there a way to persist a Python machine learning model when using the free Databricks community edition?
It looks like the DBFS is not available. This means that I can't use tools like joblib to save the model in the file system.
ML Flow is not…

Trey
- 201
- 3
- 14
1
vote
0 answers
Building an API around Databricks Notebook
I'm very new to the Databricks community platform. I have recently developed an ML model using databricks and would like to productionize it using a Swagger API. I have tried it in bits and pieces but can't figure it out at all. Can someone please…

Fahad
- 11
- 2
1
vote
1 answer
Unable to create feature table on databricks
from pyspark.sql import SparkSession, Row
from datetime import date
spark = SparkSession.builder.getOrCreate()
tempDf = spark.createDataFrame([
Row(date=date(2022,1,22), average=40.12),
Row(date=date(2022,1,23), average=41.32),
…

user22
- 112
- 1
- 9