60

I want to convert the values inside a column to lowercase. Currently if I use the lower() method, it complains that column objects are not callable. Since there's a function called lower() in SQL, I assume there's a native Spark solution that doesn't involve UDFs, or writing any SQL.

Ronak Jain
  • 3,073
  • 1
  • 11
  • 17
wlad
  • 2,073
  • 2
  • 18
  • 29

4 Answers4

78

Import lower alongside col:

from pyspark.sql.functions import lower, col

Combine them together using lower(col("bla")). In a complete query:

spark.table('bla').select(lower(col('bla')).alias('bla'))

which is equivalent to the SQL query

SELECT lower(bla) AS bla FROM bla

To keep the other columns, do

spark.table('foo').withColumn('bar', lower(col('bar')))

Needless to say, this approach is better than using a UDF because UDFs have to call out to Python (which is a slow operation, and Python itself is slow), and is more elegant than writing it in SQL.

jxc
  • 13,553
  • 4
  • 16
  • 34
wlad
  • 2,073
  • 2
  • 18
  • 29
6
from pyspark.sql.functions import lower

df = df.withColumn("col_name", lower(df["col_name"]))
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
user21091021
  • 61
  • 1
  • 2
  • 1
    Welcome to Stack Overflow! While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Suraj Rao Jan 27 '23 at 04:52
5

You can use a combination of concat_ws and split

from pyspark.sql.functions import *

df.withColumn('arr_str', lower(concat_ws('::','arr'))).withColumn('arr', split('arr_str','::')).drop('arr_str')
smishra
  • 3,122
  • 29
  • 31
3

Another approach which may be a little cleaner:

import pyspark.sql.functions as F

df.select("*", F.lower("my_col"))

this returns a data frame with all the original columns, plus lowercasing the column which needs it.

NonCreature0714
  • 5,744
  • 10
  • 30
  • 52