Both methods work.
Using the SQL method works. Don't use *
, that will include the old columns, just do your CONCAT
and rename with AS
.
customers.createOrReplaceTempView("customers")
spark.sql("SELECT CONCAT(name, ' ', last_name) AS name FROM customers").show()
//+--------+
//| name|
//+--------+
//|John Doe|
//|Jane Doe|
//+--------+
withColumn
also works, and similarly there is a withColumnRenamed
.
So perform your operations as you wish, creating a new column and then drop the original column(s) and rename the new column.
// Problem Setup
val customers = = Seq(("John", "Doe"), ("Jane", "Doe")).toDF("name", "last_name")
customers.show()
//+----+---------+
//|name|last_name|
//+----+---------+
//|John| Doe|
//|Jane| Doe|
//+----+---------+
import org.apache.spark.sql.functions.{lit, col, concat}
customers.withColumn(
"name_last_name", concat(col("name"), lit(" "), col("last_name"))
).drop("name", "last_name").withColumnRenamed("name_last_name", "name").show()
//+--------+
//| name|
//+--------+
//|John Doe|
//|Jane Doe|
//+--------+
Of course you can go ahead and do the operation itself in the withColumn
function call, giving the newly generated column the label name
replaces the old one, but you'll still have to drop last_name
.
customers.withColumn(
"name", concat(col("name"), lit(" "), col("last_name"))
).drop("last_name").show()
//+--------+
//| name|
//+--------+
//|John Doe|
//|Jane Doe|
//+--------+