I use a docker with some containers (one for Jupyter-Lab, one for Spark and 3 for each products of ELK (ElasticSearch, Kibana and Logstash).
I also use sparkmagic for my jupyter's notebooks.
So what I'm trying to do is to send an Output of a cell to spark and then use it to create a spark Dataframe.
First of all, I created a python script working with pandas for analyze an Excel file (sys.argv[1] is my excel File and sys.argv[2] is my sheet's name) and return me data (in my case the data is stored in a dict)
Here is my python code :
import pandas as pd
import numpy as np
import json
from os import sys
def prct_KPY():
perct_dep = {}
perct_dep['val1'] = round(df.iloc[0, 1]*100)
perct_dep['val2'] = round(df.iloc[0, 2]*100)
perct_dep['val3'] = round(df.iloc[0, 3]*100)
perct_dep['val4'] = round(df.iloc[0, 4]*100)
return perct_dep
df = pd.read_excel(sys.argv[1], sys.argv[2], skiprows=50)
var = prct_KPY()
print(var)
This python code is stored in a python file, nammed "test.py".
Afterwards, I want to use this dict into a spark DataFrame as an arg (and therefore i'll send it to my Elastic).
So I call my script by using this code in a notebook's cell :
%%!
python3 test.py "Path_Of_My_Excel_File" "Name_Of_My_Sheet"
and I get an output :
["{'val1': 96, 'val2': 94, 'val3': 96, 'val4': 96}", '']
this is the object's type : .
I can use the result with "_" in another cell but when I try to use it in a spark cell, it doesn't work ! I have this error message :
An error was encountered: name '' is not defined Traceback (most recent call last): NameError: name '' is not defined
How can i spend this output in a spark cell ?
Thanks for any help !