here is my dataframe:
I am looking for the right way to replace city's value based on the name, for example, case name when 'Alice' then 'New York' when 'Alex' then 'LA' when 'Aaron' then 'Beijing' when .......
Thank you.
here is my dataframe:
I am looking for the right way to replace city's value based on the name, for example, case name when 'Alice' then 'New York' when 'Alex' then 'LA' when 'Aaron' then 'Beijing' when .......
Thank you.
Probably the simplest way would be to store your mappings in a second dataframe and join them together:
val df = sqlContext.read.json(sc.parallelize(
"""{"name":"Alice", "age":21, "city":"Toronto"}""" :: Nil))
val mapping = sqlContext.read.json(sc.parallelize(
"""{"name":"Alice", "newcity":"New York"}""" :: Nil))
df.join(mapping, "name").select("name", "age", "newcity").collect
// Array[org.apache.spark.sql.Row] = Array([Alice,21,New York])
Another option would be to just write SQL, you'll just have to phrase your operation as a SELECT
rather than an update, for a simple mapping you could use CASE
, if you have more mappings you should use a JOIN
:
df.registerTempTable("df")
sqlContext.sql("""select name,
age,
case when name = 'Alice' then 'New York' end as city
from df""").collect()
// Array[org.apache.spark.sql.Row] = Array([Alice,21,New York])