I want to trim extra characters from a given column "name". For that, I am using expr function within which I am passing a SQL expression to trim extra characters.
from pyspark.sql import Row
from pyspark.sql.functions import expr
data = [
Row(id = 1, name = "Lisa Brenan", phone = Row(home = "+1 23456789", personal = None), projects = ["CIBC", "Shell"], salary = 11000),
Row(id = 2, name = " Thomas Kingston", phone = Row(home = "+1 98765432", personal = "+1 2345665432"), projects = ["BMW"], salary = 15000),
Row(id = 3, name = "[Lucy Pierson]", phone = Row(home = None, personal = None), projects= None, salary = 20000)
]
df = spark.createDataFrame(data)
df.\
withColumn("correct_name", expr("rtrim(TRAILING ']' FROM name)")).\
select("correct_name").\
show()
I am receiving the below error message:
ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'FROM'.(line 1, pos 19)
== SQL ==
rtrim(TRAILING ']' FROM name)
-------------------^^^
Please let me know the cause and correct solution for the same.