I am beginner in databricks and pyspark. Currently, I have a pyspark dataframe which contains 3 columns :
- Date
- amount
- Currency
I would like to have the amount columns converted in EUR and calculated with the exchange rate of the day. For that purpose, I am using the exchange rate API to find the exchange rate by taking the date and currency as parameters.
First, I defined a function which make the API call to find the exchange rate
here is my code :
def API(val1,currency,date):
r = requests.get('https://api.exchangeratesapi.io/'+date,params={'symbols':currency})
df = spark.read.json(sc.parallelize([r.json()]))
df_value = df.select(F.col("rates."+currency))
value = df_value.collect()[0][2]
val = val1*(1/value)
return float(val)
Then, I defined a UDF to call this function inside my dataframe:
API_Convert = F.udf(lambda x,y,z : API(x,y,z) if (y!= 'EUR') else x, FloatType())
When I try to execute this part I get the pickling error which I absolutely don't understand...
df = df.withColumn('amount',API_Convert('amount','currency','date'))
PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Could you please help me to fix this issue ?
Many Thanks