0

For example, I have a table customers in which there is a column name and column last_name.

I want to concatenate both these columns and overwrite the column name with the result of the concatenation.

In Spark sql api we can do something like:

customers.withColumn("name", concat(col("name"), lit(" "), col("last_name")))

Doing the following in sql query:

select *, concat(name, " ", last_name) AS name from customers

will instead add another column name in the dataframe. So at the end there are two columns with the same name name.

Is there a way in sql query to overwrite the existing column without adding a new one ?

Ayoub Omari
  • 806
  • 1
  • 7
  • 24

1 Answers1

-1

Both methods work.

Using the SQL method works. Don't use *, that will include the old columns, just do your CONCAT and rename with AS.

customers.createOrReplaceTempView("customers")
spark.sql("SELECT CONCAT(name, ' ', last_name) AS name FROM customers").show()
//+--------+
//|    name|
//+--------+
//|John Doe|
//|Jane Doe|
//+--------+

withColumn also works, and similarly there is a withColumnRenamed.

So perform your operations as you wish, creating a new column and then drop the original column(s) and rename the new column.

// Problem Setup
val customers = = Seq(("John", "Doe"), ("Jane", "Doe")).toDF("name", "last_name")

customers.show()
//+----+---------+
//|name|last_name|
//+----+---------+
//|John|      Doe|
//|Jane|      Doe|
//+----+---------+

import org.apache.spark.sql.functions.{lit, col, concat}

customers.withColumn(
  "name_last_name", concat(col("name"), lit(" "), col("last_name"))
).drop("name", "last_name").withColumnRenamed("name_last_name", "name").show()
//+--------+
//|    name|
//+--------+
//|John Doe|
//|Jane Doe|
//+--------+

Of course you can go ahead and do the operation itself in the withColumn function call, giving the newly generated column the label name replaces the old one, but you'll still have to drop last_name.

customers.withColumn(
  "name", concat(col("name"), lit(" "), col("last_name"))
).drop("last_name").show()
//+--------+
//|    name|
//+--------+
//|John Doe|
//|Jane Doe|
//+--------+
kfkhalili
  • 996
  • 1
  • 11
  • 24