Questions tagged [spark-notebook]

The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser

120 questions
1
vote
2 answers

Pulling data from serverless SQL external table to spark for sentiment analysis

I'm not experienced at all with Azure, yet I've been tasked with setting up the above. The serverless SQL external tables were set up by a company contracted to do so and use the SynapseDeltaFormat as the format, if that matters. One of the tables…
Brad
  • 272
  • 2
  • 7
  • 22
1
vote
1 answer

How to pass For-Each current item into Azure Spark Notebook

Spent a few hours attempting to pass the @item() from a for-each activity into an Azure Spark notebook as a string. So that others do not have to struggle with this I will provide the answer.
bmukes
  • 119
  • 2
  • 9
1
vote
3 answers

Apache Zeppelin Error When Importing Pandas

I'm facing a strange error when importing the Pandas library into my Zeppelin notebook. Here is the basic code that I have as part of my cell: %python import pandas as pd df = pd.read_csv (r'target/youtube_videos.csv') print (df) I get the…
joesan
  • 13,963
  • 27
  • 95
  • 232
1
vote
0 answers

Upload external jar to all nodes of an EMR cluster for EMR jupyter notebook

I want to use external jar in all instances/nodes of an EMR cluster so that it can be used further in EMR jupyter notebook. I am currently using the follow #!/bin/bash aws s3 cp…
Aman
  • 123
  • 1
  • 2
  • 10
1
vote
2 answers

How do we access a file in github repo inside our azure databricks notebook

We have a requirement where we need to access a file hosted on our github private repo in our Azure Databricks notebook. Currently we are doing it using curl command using the Personal Access Token of a user. curl -H 'Authorization: token…
1
vote
3 answers

How to pass a dataframe as notebook parameter in databricks?

I have a requirement wherein I need to pass a pyspark dataframe as notebook parameter to a child notebook. Essentially, the child notebook has few functions with argument type as dataframe to perform certain tasks. Now the problem is I'm unable to…
user16714516
1
vote
1 answer

Databricks Notebook Schedule

I have scheduled an ADB notebook to run on a schedule. Will the notebook run if the cluster is down? Right now the cluster is busy so unable to stop and try it out. Will the notebook start the cluster and run or would wait for the cluster to be up?
1
vote
2 answers

databricks Python notebook from azure data factory or locally if statement

I have a Databricks Python notebook that reads in a parameter from ADF using: Program_Name = dbutils.widgets.get("Program_Name") Is there an IF statement or something similar I can do in the notebook code, such that when I run the notebook…
1
vote
0 answers

Connecting from Azure Synapse spark notebook to SQL-Pool table

I'm looking for, with no success, how to read a Azure Synapse table from a SQL-Pool of another workspace using Scala Spark (since it is apparently the only option). I found in…
1
vote
1 answer

Databricks Delta files adding new partition causes old ones to be not readable

I have a notebook using which i am doing a history load. Loading 6 months data everytime, starting with 2018-10-01. My delta file is partitioned by calendar_date After the initial load i am able to read the delta file and look the data just…
Krish
  • 390
  • 4
  • 15
1
vote
2 answers

Delete all the cells of the Databricks Notebook

I am working on Databricks notebook for some of the spark work that I am doing. I am using notebook just as a proof of concept work initially and then organized that so that I can create jar out of it. As I am doing POC I try adding lot of cells to…
Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35
1
vote
0 answers

Load functions if there are not loaded yet in Databricks notebooks

I am coding Python in Databricks and I am using spark 2.4.5. I have several notebooks for loading my dimention tables and my fact tables. I have two master notebooks for loading Dimensions and Facts. I have developed some UDF for testing, auditing…
Ardalan Shahgholi
  • 11,967
  • 21
  • 108
  • 144
1
vote
0 answers

Error in SQL statement: AnalysisException: cannot resolve '`T_B.N`' given input columns

I need a help. The error "Error in SQL statement: AnalysisException: cannot resolve 'T_B.N' given input columns: []; line 3 pos 10;" comes out when run the code. How i fix that? Is there an better way to write the query? My "colleagues" and me have…
1
vote
1 answer

How to stop a notebook streaming job gracefully?

I have a streaming application which is running into a Databricks notebook job (https://docs.databricks.com/jobs.html). I would like to be able to stop the streaming job gracefully using the stop() method of the StreamingQuery class which is…
abiratsis
  • 7,051
  • 3
  • 28
  • 46
1
vote
0 answers

error: spark scala: java.nio.channels.ClosedByInterruptException -> cannot do show() or count() on dataset

I am reading a dataframe in Databricks notebook as: val data = files .grouped(10000) .toParArray .map(subList => { spark.read .format("avro") .schema( StructType(List(StructField("Body", BinaryType, nullable = true), …
user3868051
  • 1,147
  • 2
  • 22
  • 43