See the below implementation -
Input DF-
from pysprl.sql.types import *
schema = StructType([
StructField("currency", StringType(), True),
StructField("value", DoubleType(), True)
])
data = [("USD", 1.00), ("EUR", 2.00)]
raw_df = spark.createDataFrame(data, schema)
Required Output-
from pyspark.sql.functions import *
currency_exchanges = {'EUR': 1.00, 'USD': 0.90}
currency_df = spark.createDataFrame(list(currency_exchanges.items()), ['currency', 'exchange_rate'])
result_df = raw_df.join(broadcast_currency_df, 'currency', 'left') \
.withColumn('value_eur', when(col('currency') == 'EUR', col('value')).otherwise(col('value') * col('exchange_rate'))) \
.drop('exchange_rate')
result_df.show()
+--------+-----+---------+
|currency|value|value_eur|
+--------+-----+---------+
| USD| 1.0| 0.9|
| EUR| 2.0| 2.0|
+--------+-----+---------+
Here I've created another currency_df
using the dictionary and have joined raw_df
with currency_df
for the required use case.