My data looks like below
+------------+--------------+---------------+
|domain | country_code | country |
+------------+--------------+---------------+
|amazon.de | DE | Germany |
|amazon.uk | UK | united kingdom|
|amazon.de | UK | mismatched |
|amazon.uk | DE | mismatched |
+------------+--------------+---------------+
In the above data I want to correct the country_code so anything containing .de in domain column should be checked against a country_code column and if Country_code contains DE then it's a correct match. Anything otherwise is incorrect
So I am trying to create a new column country like below. However, I am unable to use the and statement while using when. Can you please help
import pyspark.sql.functions as f
df = df.withColumn(
'country',
f.when(
f.col('domain') == '.de' && f.col('country_code') == 'DE',
'Germany'
).otherwise('mismatch')
)