The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser
Questions tagged [spark-notebook]
120 questions
0
votes
1 answer
Azure Spark Notebook Processing Large Text and/or Binary files
Reference Azure Synapse Pipeline running Spark Notebook Generates Random Errors for more information on this.
I have been fighting getting an Azure Synapse Spark Notebook to process an uncompressed 778MB IIS file. The previous link shows some of…

bmukes
- 119
- 2
- 9
0
votes
3 answers
Azure Synapse Pipeline running Spark Notebook Generates Random Errors
I am processing approximately 19,710 directories containing IIS log files in an Azure Synapse Spark notebook. There are 3 IIS log files in each directory. The notebook reads the 3 files located in the directory and converts them from text…

bmukes
- 119
- 2
- 9
0
votes
1 answer
HowTo Flatten simple Json file in Azure Synapse Spark Notebook and convert to Parquet
I needed to flatten a simple Json file (json lines) and convert it into a Parquet format within a Spark Notebook in Azure Synapse Analytics.
There is only one level of nested object for any column. However, I discovered that getting the schema of…

bmukes
- 119
- 2
- 9
0
votes
1 answer
Convert String to Date Time Filed in Azure Data Bricks
I have the following text string that represents a date time from an application .
2021-11-22 07:28:47 PM
I need to convert this to a date time to do a DATE ADD operation .
I have tried this many ways with no success and it gives me null in Azure…

James Khan
- 773
- 2
- 18
- 46
0
votes
0 answers
Orchestrate Azure synapse spark notebook from C#/api
Is there a way to execute notebook from c# like an api or sdk. I found the following to create and update notebooks https://learn.microsoft.com/en-us/dotnet/api/overview/azure/analytics.synapse.artifacts-readme-pre, nothing to trigger it like how I…

user2934433
- 343
- 1
- 5
- 20
0
votes
1 answer
spark.sql write to csv cause shifted column data issue when comma is there
I'm using scala as programming language in my azure databricks notebook, where my dataframe giving me accurate result, but when I'm trying to store the same in csv it shifting the cell where comma(,) is coming
spark.sql("""
SELECT * FROM…

Manish Jain
- 217
- 1
- 4
- 16
0
votes
1 answer
display(df.limit(10)) does not always work in synapse notebooks
Within synapse notebooks, running display(df.limit(10)) does not always work.
It usually works when the notebook is first run, but after a while, if i run it again, it does not display the df.
The server has not died or timed out, code is still…

wilson_smyth
- 1,202
- 1
- 14
- 39
0
votes
1 answer
Azure Synapse Pipeline Notebook Return Error
I want to create pipeline on Azure Synapse and one of the flow is using notebook to read, validate and then continue the pipeline or stop the pipeline
if(validation=True): #success on validation
return df #continue the…

OctavianWR
- 217
- 1
- 16
0
votes
0 answers
How do I create a Sequence in Pyspark that resets when rows change from 0 to 1 and and increments when all are 1's
I have a pyspark dataframe like this and need the SEQ output as shown:
R_ID ORDER SC_ITEM seq
A 1 0
A 3 1 1
A 4 1 2
A 5 1 3
A 6 1 4
A 7 1 5
A 8 1 6
A 9 1 7
A 10 0 0
A 11 1 1
A 12 0…

Shay Pal
- 1
0
votes
1 answer
Filter like %[A-Za-z]% in databricks
I am trying to use table.column LIKE '%[A-Za-z]% in Databricks notebook, but it returns no value.
It worked in SQL server, but it seems it's not working in Pysql.
Does anyone know what's the alternative in Databricks?

cornerstone347
- 27
- 4
0
votes
1 answer
Install interpreter for Zeppelin
I need to custom install interpreter for zeppelin apache. Not all of interpreter, i only need md, shell, python (default), jdbc, spark (default). I do some ways, but it failed:
Install online via command
./bin/install-interpreter.sh --name…

qxk71551
- 95
- 9
0
votes
2 answers
Writing parquet file throws...An HTTP header that's mandatory for this request is not specified
I have two ADLSv2 storage accounts, both are hierarchical namespace enabled.
In my Python Notebook, I'm reading a CSV file from one storage account and writing as parquet file in another storage, after some enrichment.
I am getting below error when…

user3023949
- 121
- 2
- 8
0
votes
2 answers
Azure databricks job - notebook snapshot
We are running scheduled databricks jobs on a daily basis in Azure databricks and it runs successfully on all days. But today (29th Sept 2020), the job is failing within few seconds with Internal Error. The error message is given below:
Error…

Saravanan
- 49
- 6
0
votes
1 answer
How to call remote SQL function inside PySpark or Scala databriks notebook
I am writing databriks scala / python notebook which connect SQL server database.
and i want to execute sql server function from notebook with custom paramters.
import com.microsoft.azure.sqldb.spark.config.Config
import…

rohit patil
- 159
- 2
- 9
0
votes
1 answer
why some notes in spark works very slow? and why multiple execution in same situation has different execution time?
My question is about the execution time of pyspark codes in zeppelin.
I have some notes and I work with some SQL's in it. in one of my notes, I convert my dataframe to panda with .topandas() function. size of my data is about 600 megabyte.
my…

Saeed
- 159
- 3
- 13