The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser
Questions tagged [spark-notebook]
120 questions
1
vote
2 answers
Pulling data from serverless SQL external table to spark for sentiment analysis
I'm not experienced at all with Azure, yet I've been tasked with setting up the above. The serverless SQL external tables were set up by a company contracted to do so and use the SynapseDeltaFormat as the format, if that matters. One of the tables…

Brad
- 272
- 2
- 7
- 22
1
vote
1 answer
How to pass For-Each current item into Azure Spark Notebook
Spent a few hours attempting to pass the @item() from a for-each activity into an Azure Spark notebook as a string. So that others do not have to struggle with this I will provide the answer.

bmukes
- 119
- 2
- 9
1
vote
3 answers
Apache Zeppelin Error When Importing Pandas
I'm facing a strange error when importing the Pandas library into my Zeppelin notebook. Here is the basic code that I have as part of my cell:
%python
import pandas as pd
df = pd.read_csv (r'target/youtube_videos.csv')
print (df)
I get the…

joesan
- 13,963
- 27
- 95
- 232
1
vote
0 answers
Upload external jar to all nodes of an EMR cluster for EMR jupyter notebook
I want to use external jar in all instances/nodes of an EMR cluster so that it can be used further in EMR jupyter notebook.
I am currently using the follow
#!/bin/bash aws s3 cp…

Aman
- 123
- 1
- 2
- 10
1
vote
2 answers
How do we access a file in github repo inside our azure databricks notebook
We have a requirement where we need to access a file hosted on our github private repo in our Azure Databricks notebook.
Currently we are doing it using curl command using the Personal Access Token of a user.
curl -H 'Authorization: token…

boom_clap
- 129
- 1
- 12
1
vote
3 answers
How to pass a dataframe as notebook parameter in databricks?
I have a requirement wherein I need to pass a pyspark dataframe as notebook parameter to a child notebook. Essentially, the child notebook has few functions with argument type as dataframe to perform certain tasks. Now the problem is I'm unable to…
user16714516
1
vote
1 answer
Databricks Notebook Schedule
I have scheduled an ADB notebook to run on a schedule. Will the notebook run if the cluster is down? Right now the cluster is busy so unable to stop and try it out. Will the notebook start the cluster and run or would wait for the cluster to be up?

Himanshu Kaushik
- 25
- 1
- 5
1
vote
2 answers
databricks Python notebook from azure data factory or locally if statement
I have a Databricks Python notebook that reads in a parameter from ADF using:
Program_Name = dbutils.widgets.get("Program_Name")
Is there an IF statement or something similar I can do in the notebook code, such that when I run the notebook…

Simon Norton
- 95
- 11
1
vote
0 answers
Connecting from Azure Synapse spark notebook to SQL-Pool table
I'm looking for, with no success, how to read a Azure Synapse table from a SQL-Pool of another workspace using Scala Spark (since it is apparently the only option).
I found in…

manuel bustamante
- 43
- 6
1
vote
1 answer
Databricks Delta files adding new partition causes old ones to be not readable
I have a notebook using which i am doing a history load. Loading 6 months data everytime, starting with 2018-10-01.
My delta file is partitioned by calendar_date
After the initial load i am able to read the delta file and look the data just…

Krish
- 390
- 4
- 15
1
vote
2 answers
Delete all the cells of the Databricks Notebook
I am working on Databricks notebook for some of the spark work that I am doing. I am using notebook just as a proof of concept work initially and then organized that so that I can create jar out of it. As I am doing POC I try adding lot of cells to…

Nikunj Kakadiya
- 2,689
- 2
- 20
- 35
1
vote
0 answers
Load functions if there are not loaded yet in Databricks notebooks
I am coding Python in Databricks and I am using spark 2.4.5.
I have several notebooks for loading my dimention tables and my fact tables. I have two master notebooks for loading Dimensions and Facts.
I have developed some UDF for testing, auditing…

Ardalan Shahgholi
- 11,967
- 21
- 108
- 144
1
vote
0 answers
Error in SQL statement: AnalysisException: cannot resolve '`T_B.N`' given input columns
I need a help. The error "Error in SQL statement: AnalysisException: cannot resolve 'T_B.N' given input columns: []; line 3 pos 10;" comes out when run the code. How i fix that? Is there an better way to write the query?
My "colleagues" and me have…

heathcliff1927
- 15
- 3
1
vote
1 answer
How to stop a notebook streaming job gracefully?
I have a streaming application which is running into a Databricks notebook job (https://docs.databricks.com/jobs.html). I would like to be able to stop the streaming job gracefully using the stop() method of the StreamingQuery class which is…

abiratsis
- 7,051
- 3
- 28
- 46
1
vote
0 answers
error: spark scala: java.nio.channels.ClosedByInterruptException -> cannot do show() or count() on dataset
I am reading a dataframe in Databricks notebook as:
val data = files
.grouped(10000)
.toParArray
.map(subList => {
spark.read
.format("avro")
.schema(
StructType(List(StructField("Body", BinaryType, nullable = true),
…

user3868051
- 1,147
- 2
- 22
- 43