Questions tagged [spark-notebook]

The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser

120 questions
0
votes
1 answer

Azure Synapse Notebook Read Variable

I have a vary simple/toy pipeline where I have a pyspark notebook that has an exit value, a set variable activity set to the exit value and a second notebook that is parameterized with the variable. It looks like the below. I successfully set the…
Jeff Tilton
  • 1,256
  • 1
  • 14
  • 28
0
votes
1 answer

Delete __HIVE_DEFAULT_PARTITION__ USING spark Notebook

Tried everything for a few hours to delete a record with a column partition value of __HIVE_DEFAULT_PARTITION__ within my delta lake table using a spark notebook. I figured it out and will post the answer. For the record my partition column is…
bmukes
  • 119
  • 2
  • 9
0
votes
0 answers

read keyvault secret from Synapse notebook

I am trying to read keyvault secret from Synapse notebook using: s = TokenLibrary.getSecret(kv, secret_name) It works when I am running it in debug mode, but fails when it is scheduled. I granted Synapse server managed identity Get and List secret…
0
votes
0 answers

How create dynamic JSON file (for bulk api upsert of elasticsearch) using data factory

I am new to Azure Data Factory and I need to create a json file for bulk api upsert of elasticsearch with the following considerations; input is in json format which will be used as payload for upsert api, each row consists of an array and objects…
0
votes
0 answers

Synapse: Integration runtime and Notebooks

I'm trying to load data to a Spark DataFrame from MSSQL/Postgres behind a firewall. When I use pipelines and datasets I can use a Linked service that connects via an integration runtime. How to do it with a notebook and dataframe? Is there a way to…
0
votes
1 answer

AutoLoader with a lot of empty parquet files

I want to process some parquet files (with snappy compression) using AutoLoader in Databricks. A lot of those files are empty or contain just one record. Also, I cannot change how they are created, nor compact them. Here are some of the approaches I…
0
votes
1 answer

Synapse Notebook reference - Call Synapse pipeline from Notebook

I'm trying to run a synapse pipeline from a synapse notebook, is there any way to do it? My synapse pipeline has parameters, - if it's possible to run it from a notebook then how to pass the params?
Robert G
  • 3
  • 2
0
votes
2 answers

File path error in pipeline for spark notebook in azure synapse

I have a spark notebook which I am running with the help of pipeline. The notebook is running fine manually but in the pipeline it is giving error for file location. In the code I am loading the file in a data frame. The file location in the code is…
darkstar
  • 39
  • 6
0
votes
2 answers

Py4JJavaError: An error occurred while calling o771.save. Azure Synapse Analytics Notebook

Here is my pyspark code used in Notebook data_lake_container = 'abfss://abc.dfs.core.windows.net' stage_folder = 'abc' delta_lake_folder = 'abc' source_folder = 'abc' source_wildcard = 'abc.parquet' key_column = 'Id' …
0
votes
1 answer

Calling referenced functions after mssparkutil.notebook.run?

How can I call functions defined in a different Synapse notebook after running the notebook with mssparkutils.notebook.run()? example: #parameters value = "test" from notebookutils import mssparkutils mssparkutils.notebook.run("function…
0
votes
1 answer

Synapse Notebook Password visible on Runtime

I have created Synapse Notebook in which I am passing parameters for secrets like password. These secrets are in KeyVault, being passed to the Notebook as parameters. Ideally I would expect that these secrets are not visible to developers. However…
0
votes
1 answer

Synapse Pipeline Notebook cant resolve method from referenced Notebook

I have a Synapse Pipeline which runs a notebook containing unit tests before executing the business job (another notebook). The unit test notebook references the functions using the mssparkutils.notebook.run() command, and works fine when I run the…
0
votes
1 answer

Apache Spark unable to recognize columns in UTF-16 csv file

Question: Why I am getting following error on the last line of the code below, how the issue can be resolved? AttributeError: 'DataFrame' object has no attribute 'OrderID' CSV File encoding: UTF-16 LE BOM Number of columns: 150 Rows: 5000 Language…
nam
  • 21,967
  • 37
  • 158
  • 332
0
votes
1 answer

why the Job running time and command execution time not matching in databricks notebook?

I have a azure databricks job and it's triggered via ADF using a api call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands…
0
votes
1 answer

Deleting files in Azure Synapse Notebook

This should have been simple but turned out to require a bit of GoogleFu. I have an Azure Synapse Spark Notebook written in C# that Receives a list of Deflate compressed IIS files. Reads the files as binary into a DataFrame Decompresses these files…
bmukes
  • 119
  • 2
  • 9