The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser
Questions tagged [spark-notebook]
120 questions
0
votes
1 answer
Azure Synapse Notebook Read Variable
I have a vary simple/toy pipeline where I have a pyspark notebook that has an exit value, a set variable activity set to the exit value and a second notebook that is parameterized with the variable. It looks like the below.
I successfully set the…

Jeff Tilton
- 1,256
- 1
- 14
- 28
0
votes
1 answer
Delete __HIVE_DEFAULT_PARTITION__ USING spark Notebook
Tried everything for a few hours to delete a record with a column partition value of __HIVE_DEFAULT_PARTITION__ within my delta lake table using a spark notebook. I figured it out and will post the answer. For the record my partition column is…

bmukes
- 119
- 2
- 9
0
votes
0 answers
read keyvault secret from Synapse notebook
I am trying to read keyvault secret from Synapse notebook using:
s = TokenLibrary.getSecret(kv, secret_name)
It works when I am running it in debug mode, but fails when it is scheduled. I granted Synapse server managed identity Get and List secret…

DejanS
- 96
- 9
0
votes
0 answers
How create dynamic JSON file (for bulk api upsert of elasticsearch) using data factory
I am new to Azure Data Factory and I need to create a json file for bulk api upsert of elasticsearch with the following considerations;
input is in json format which will be used as payload for upsert api, each row consists of an array and objects…
0
votes
0 answers
Synapse: Integration runtime and Notebooks
I'm trying to load data to a Spark DataFrame from MSSQL/Postgres behind a firewall.
When I use pipelines and datasets I can use a Linked service that connects via an integration runtime.
How to do it with a notebook and dataframe?
Is there a way to…

Robert G
- 3
- 2
0
votes
1 answer
AutoLoader with a lot of empty parquet files
I want to process some parquet files (with snappy compression) using AutoLoader in Databricks. A lot of those files are empty or contain just one record. Also, I cannot change how they are created, nor compact them.
Here are some of the approaches I…

Blend Mexhuani
- 23
- 5
0
votes
1 answer
Synapse Notebook reference - Call Synapse pipeline from Notebook
I'm trying to run a synapse pipeline from a synapse notebook, is there any way to do it?
My synapse pipeline has parameters, - if it's possible to run it from a notebook then how to pass the params?

Robert G
- 3
- 2
0
votes
2 answers
File path error in pipeline for spark notebook in azure synapse
I have a spark notebook which I am running with the help of pipeline. The notebook is running fine manually but in the pipeline it is giving error for file location. In the code I am loading the file in a data frame. The file location in the code is…

darkstar
- 39
- 6
0
votes
2 answers
Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook
Here is my pyspark code used in Notebook
data_lake_container = 'abfss://abc.dfs.core.windows.net'
stage_folder = 'abc'
delta_lake_folder = 'abc'
source_folder = 'abc'
source_wildcard = 'abc.parquet'
key_column = 'Id'
…

Mamatha Anu
- 1
- 2
0
votes
1 answer
Calling referenced functions after mssparkutil.notebook.run?
How can I call functions defined in a different Synapse notebook after running the notebook with mssparkutils.notebook.run()?
example:
#parameters
value = "test"
from notebookutils import mssparkutils
mssparkutils.notebook.run("function…

blunderoverflow
- 13
- 3
0
votes
1 answer
Synapse Notebook Password visible on Runtime
I have created Synapse Notebook in which I am passing parameters for secrets like password. These secrets are in KeyVault, being passed to the Notebook as parameters. Ideally I would expect that these secrets are not visible to developers. However…

user13442358
- 44
- 1
- 7
0
votes
1 answer
Synapse Pipeline Notebook cant resolve method from referenced Notebook
I have a Synapse Pipeline which runs a notebook containing unit tests before executing the business job (another notebook). The unit test notebook references the functions using the mssparkutils.notebook.run() command, and works fine when I run the…

blunderoverflow
- 13
- 3
0
votes
1 answer
Apache Spark unable to recognize columns in UTF-16 csv file
Question: Why I am getting following error on the last line of the code below, how the issue can be resolved?
AttributeError: 'DataFrame' object has no attribute 'OrderID'
CSV File encoding: UTF-16 LE BOM
Number of columns: 150
Rows: 5000
Language…

nam
- 21,967
- 37
- 158
- 332
0
votes
1 answer
why the Job running time and command execution time not matching in databricks notebook?
I have a azure databricks job and it's triggered via ADF using a api call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands…
0
votes
1 answer
Deleting files in Azure Synapse Notebook
This should have been simple but turned out to require a bit of GoogleFu.
I have an Azure Synapse Spark Notebook written in C# that
Receives a list of Deflate compressed IIS files.
Reads the files as binary into a DataFrame
Decompresses these files…

bmukes
- 119
- 2
- 9