16

I have a column in a data frame in pyspark like “Col1” below. I would like to create a new column “Col2” with the length of each string from “Col1”. I’m new to pyspark, I’ve been googling but haven’t seen any examples of how to do this. Any tips are very much appreciated.

example:

Col1 Col2
12   2
123  3
Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
user3476463
  • 3,967
  • 22
  • 57
  • 117
  • Possible duplicate of [compute string length in Spark SQL DSL](https://stackoverflow.com/questions/28544774/compute-string-length-in-spark-sql-dsl) – Alper t. Turker May 12 '18 at 09:17

1 Answers1

41

You can use the length function:

import pyspark.sql.functions as F
df.withColumn('Col2', F.length('Col1')).show()
+----+----+
|Col1|Col2|
+----+----+
|  12|   2|
| 123|   3|
+----+----+
Psidom
  • 209,562
  • 33
  • 339
  • 356