Questions tagged [databricks-community-edition]

85 questions
0
votes
0 answers

Azure Databricks community edition COPY INTO command error

I have been getting the following error upon running the COPY INTO command in Databricks Community edition: Error in SQL statement: UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.createAtomicIfAbsent(path:…
SAR182
  • 11
  • 3
0
votes
1 answer

why do I have a label Problem when using Crossvalidator

I'm new to spark :) I try to use CrossValidator. My model is as follows : training #training data - several repartition have been tested, 50/50 seems the best (trainData, testData) = modelData.randomSplit([0.5, 0.5]) #counting data…
0
votes
1 answer

how to calculate multiple elements sum and average with RDD

I am a big newbie in pyspark. have organized a RDD with the following code: rdd1 = labRDD.map(lambda kv: (kv[0].split("/")[-1].split('.')[0], kv[1])) rdd2 = rdd1.flatMapValues(lambda v: v.split('\r\n')) rdd3 = rdd2.map(lambda kv: (kv[0],…
0
votes
1 answer

How to read a csv file from an FTP using PySpark in Databricks Community

I am trying to fetch a file using FTP (kept on Hostinger) using Pyspark in Databricks community. Everything works fine until I try to read that file using spark.read.csv('MyFile.csv'). Following is the code and an error, PySpark…
0
votes
1 answer

Converting SQL Query to Databricks SQL

I have a query that I need to convert to Databricks SQL or run against a table in a Databrick environment but failing even though it works very well against tables SQL Server. The tables and query can be found here The query to convert or run in…
0
votes
1 answer

Could anyone help me with these two SQL queries on my Items table?

I made a SQL project for myself to learn. I am trying to answer these two questions: How many total items were sold between 1970 and 2000? What state in the US had the most items sold overall? I am using the databricks community edition. I am…
0
votes
1 answer

DataBricks: notebook : Python: FileNotFoundError

I run the following code in DataBricks: notebook and get FileNotFoundError import pandas as pd df = pd.read_csv ('E:\Myfolder1\Myfolder2\Myfolder3\myfile.csv') print(df) FileNotFoundError: [Errno 2] No such file or directory:…
Anson
  • 243
  • 1
  • 5
  • 20
0
votes
2 answers

In self join why one table giving null value?

I'm using databricks community edition. I created a temporary view. %python df.createOrReplaceTempView("athlete_events_csv") The query i'm writing with medal_count_by_country as (SELECT NOC, Year, count(*) as medal_count, row_number() over(…
0
votes
2 answers

Append dynamic multiple lines header in to an .txt file which as data from data bricks

I am trying to load data in to a abc.txt file form an .csv file which is stored in delta lake. Example : Data load with | separation in abc.txt file id|name|address|contact_no 1|abc|xyz1|123 2|efg|xyz2|456 3|hij|xyz3|789 4|klmn|xyz4|91011 Header…
0
votes
0 answers

I cannot connect from my cloud kafka to databricks community edition's spark cluster

1- I have a spark cluster on databricks community edition and I have a Kafka instance on GCP. 2- I just want to data ingestion Kafka streaming from databricks community edition and I want to analyze the data on spark. 3- This is my connection…
0
votes
1 answer

Cannot apply count() or collecr() on RDD from textfile(Spark)

I am new at Spark and I have Databricks Community Edition account. Right now I'm doing Lab and encountered with following error: !rm README.md* -f !wget https://raw.githubusercontent.com/carloapp2/SparkPOT/master/README.md textfile_rdd =…
0
votes
1 answer

Databricks, SPARK UI, sql logs: Retrieve with REST API

Is it possible to retrieve Databricks/Spark UI/SQL logs using the rest-API, any retention limit?, can't see any related API rest-api azure Databricks Note: cluster /advanced options/logging has not been set.
0
votes
0 answers

How to find out and replace String with Nested Array in CSV using Scala

I have a requirement as need to load the CSV and find & replace string with nested array using Databricks with scala. Can you please help me on this. Regards, Ram
0
votes
0 answers

Runtime duration timeout - databricks

Databricks environment - I'm trying to add a tabel(CSV file) in my notebook which is connected successfully to a cluster. But midway uploading the day an error message is showing which says 'couldn't upload, runtime duration timeout' How do I fix…
0
votes
1 answer

how to connect to mongodb Atlas from databricks cluster using pyspark

how to connect to mongodb Atlas from databricks cluster using pyspark This is my simple code in notebook from pyspark.sql import SparkSession spark = SparkSession \ .builder \ .appName("myApp") \ .config("spark.mongodb.input.uri",…