-1

Say I have a Spark DF in this format:

currency value
USD 1.00
EUR 2.00

And a dictionary with current currency exchanges
(e.g - {EUR: 1.00, USD: 0.90}).

And I want to add another column, say value_eur, based on that dict. How would I go on in doing that?

I tried the function

raw_df.withColumn("value_eur", raw_df.value * currency_exchanges[raw_df.currency])

but it gives me Error

TypeError: unhashable type: 'Column'
Dipanjan Mallick
  • 1,636
  • 2
  • 8
  • 20
  • 1
    Does this answer your question? [PySpark create new column with mapping from a dict](https://stackoverflow.com/questions/42980704/pyspark-create-new-column-with-mapping-from-a-dict) – Derek O Apr 06 '23 at 15:13

1 Answers1

0

See the below implementation -

Input DF-

from pysprl.sql.types import *

schema = StructType([
    StructField("currency", StringType(), True),
    StructField("value", DoubleType(), True)
])

data = [("USD", 1.00), ("EUR", 2.00)]
raw_df = spark.createDataFrame(data, schema)

Required Output-

from pyspark.sql.functions import *

currency_exchanges = {'EUR': 1.00, 'USD': 0.90}
currency_df = spark.createDataFrame(list(currency_exchanges.items()), ['currency', 'exchange_rate'])


result_df = raw_df.join(broadcast_currency_df, 'currency', 'left') \
    .withColumn('value_eur', when(col('currency') == 'EUR', col('value')).otherwise(col('value') * col('exchange_rate'))) \
    .drop('exchange_rate')

result_df.show()

+--------+-----+---------+
|currency|value|value_eur|
+--------+-----+---------+
|     USD|  1.0|      0.9|
|     EUR|  2.0|      2.0|
+--------+-----+---------+

Here I've created another currency_df using the dictionary and have joined raw_df with currency_df for the required use case.

Dipanjan Mallick
  • 1,636
  • 2
  • 8
  • 20