Pyspark replace strings in Spark dataframe column by using values in another column

Question

I'd like to replace a value present in a column with by creating search string from another column

before id address st 1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156 After id address st 1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156 I tried

df.withColumn("address", regexp_replace("address","PA"+st,"PA9999"))
df.withColumn("address",regexp_replace("address","PA"+df.st,"PA9999")

both seam to fail with

TypeError: 'Column' object is not callable

could be similar to Pyspark replace strings in Spark dataframe column

Regex: `(?<=PA)[^\.]+`, substitution: `9999` – Srdjan M. Feb 20 '18 at 01:00 — Srdjan M., Feb 20 '18 at 01:00
thank you very much @S.Jovan , it worked as expected :) – prudhvi Indana Feb 20 '18 at 01:40 — prudhvi Indana, Feb 20 '18 at 01:40

score 2 · Answer 1 · answered Mar 14 '19 at 13:50

You might also use the spark udf.

The solution might be applied whenever you need to modify a data frame entry with a value from another column:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

pd_input = pd.DataFrame({'address': ['2.PA1234.la','10.PA125.la','2.PA156.ln'],
             'st':['1234','125','156']})

spark_df = sparkSession.createDataFrame(pd_input)


replace_udf = udf(lambda address, st: address.replace(st,'9999'), StringType())

spark_df.withColumn('adress_new',replace_udf(col('address'),col('st'))).show()

Output:

+-----------+----+------------+
|     adress|  st|  adress_new|
+-----------+----+------------+
|2.PA1234.la|1234| 2.PA9999.la|
|10.PA125.la| 125|10.PA9999.la|
| 2.PA156.ln| 156| 2.PA9999.ln|
+-----------+----+------------+

Pyspark replace strings in Spark dataframe column by using values in another column

1 Answers1