1

I setup a Hadoop cluster with Spark 2.4, on my client I installed Jupyter notebook and the SparkMagic module.

Everything seems to work fine within the same notebook. But now I want to reuse my SparkSession in another Jupyter notebook, for example I need to reuse a Spark dataset that was created in the other notebook. Therefore I need to reuse the Session I created before.

The problem is that SparkMagic always creates a new SparkSession when I run another (PySpark) notebook. So (PySpark) notebook A uses SparkSession A and (PySpark) notebook B uses SparkSession B, which prevents the usage of the same datsets in both notebooks.

Is there a way to use the same SparkSession in two PySpark notebooks in parallel (using SparkMagic)?

D. Müller
  • 3,336
  • 4
  • 36
  • 84
  • 1
    Muller imho, new session (kernel) per notebook is a behaviour of Jupyter. In JupyterLab you can go to Kernel -> Change Kernel -> Other notebook kernel. Then you should be able to share the same session between notebooks. At least I register UDFs in one notebook and use them in another – VB_ Jul 08 '21 at 12:02

0 Answers0