2

I'd like to replace a value present in a column with by creating search string from another column

before id address st
1 2.PA1234.la 1234 2 10.PA125.la 125 3 2.PA156.ln 156
After id address st
1 2.PA9999.la 1234 2 10.PA9999.la 125 3 2.PA9999.ln 156
I tried

df.withColumn("address", regexp_replace("address","PA"+st,"PA9999"))
df.withColumn("address",regexp_replace("address","PA"+df.st,"PA9999")

both seam to fail with

TypeError: 'Column' object is not callable

could be similar to Pyspark replace strings in Spark dataframe column

prudhvi Indana
  • 789
  • 7
  • 19

1 Answers1

2

You might also use the spark udf.

The solution might be applied whenever you need to modify a data frame entry with a value from another column:

from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

pd_input = pd.DataFrame({'address': ['2.PA1234.la','10.PA125.la','2.PA156.ln'],
             'st':['1234','125','156']})

spark_df = sparkSession.createDataFrame(pd_input)


replace_udf = udf(lambda address, st: address.replace(st,'9999'), StringType())

spark_df.withColumn('adress_new',replace_udf(col('address'),col('st'))).show()

Output:

+-----------+----+------------+
|     adress|  st|  adress_new|
+-----------+----+------------+
|2.PA1234.la|1234| 2.PA9999.la|
|10.PA125.la| 125|10.PA9999.la|
| 2.PA156.ln| 156| 2.PA9999.ln|
+-----------+----+------------+
Grzegorz
  • 1,268
  • 11
  • 11