-1

I want to generate a column based on the value of an existing column. Where ever there is plus sign we want to split and pickup the 2nd part of the column and trim if there is any space.

df = spark.sql("select '10/35/70/25% T4Max-300 + 20/45/80/25% T4Max-400' as col1")
df1 = df.withColumn("newcol",col('col1').split("+")[1].strip())

getting the error TypeError: 'Column' object is not callable

Expected output is 20/45/80/25% T4Max-400

pault
  • 41,343
  • 15
  • 107
  • 149
Koushik Chandra
  • 1,565
  • 12
  • 37
  • 73
  • Possible duplicate of [Split Spark Dataframe string column into multiple columns](https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns) – pault Feb 28 '19 at 02:16
  • `split` and `trim` are not a methods of `Column` - you need to call `pyspark.sql.functions.split/trim` and pass in the column. See the linked duplicate for details. – pault Feb 28 '19 at 15:15

1 Answers1

0

The code col('col1') returns the pyspark.sql.Column in your DataFrame with the name "col1".

You are getting the error:

TypeError: 'Column' object is not callable

because you are trying to call split (and trim) as methods on this column, but no such methods exist.

Instead you want to call the functions pyspark.sql.functions.split() and pyspark.sql.functions.trim() with the Column passed in as an argument.

For instance:

df1 = df.withColumn(
    "newcol",
    f.trim(
        f.split(f.col('col1'), r"\+")[1]
    )
)
df1.show(truncate=False)
#+-----------------------------------------------+----------------------+
#|col1                                           |newcol                |
#+-----------------------------------------------+----------------------+
#|10/35/70/25% T4Max-300 + 20/45/80/25% T4Max-400|20/45/80/25% T4Max-400|
#+-----------------------------------------------+----------------------+

The second argument to split() is treated as a regular expression pattern, so the + has to be escaped.

pault
  • 41,343
  • 15
  • 107
  • 149