I want to convert the values inside a column to lowercase. Currently if I use the lower()
method, it complains that column objects are not callable. Since there's a function called lower()
in SQL, I assume there's a native Spark solution that doesn't involve UDFs, or writing any SQL.
Asked
Active
Viewed 1.3e+01k times
60

Ronak Jain
- 3,073
- 1
- 11
- 17

wlad
- 2,073
- 2
- 18
- 29
4 Answers
78
Import lower
alongside col
:
from pyspark.sql.functions import lower, col
Combine them together using lower(col("bla"))
. In a complete query:
spark.table('bla').select(lower(col('bla')).alias('bla'))
which is equivalent to the SQL query
SELECT lower(bla) AS bla FROM bla
To keep the other columns, do
spark.table('foo').withColumn('bar', lower(col('bar')))
Needless to say, this approach is better than using a UDF because UDFs have to call out to Python (which is a slow operation, and Python itself is slow), and is more elegant than writing it in SQL.
6
from pyspark.sql.functions import lower
df = df.withColumn("col_name", lower(df["col_name"]))

Suraj Rao
- 29,388
- 11
- 94
- 103

user21091021
- 61
- 1
- 2
-
1Welcome to Stack Overflow! While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Suraj Rao Jan 27 '23 at 04:52
5
You can use a combination of concat_ws and split
from pyspark.sql.functions import *
df.withColumn('arr_str', lower(concat_ws('::','arr'))).withColumn('arr', split('arr_str','::')).drop('arr_str')

smishra
- 3,122
- 29
- 31
3
Another approach which may be a little cleaner:
import pyspark.sql.functions as F
df.select("*", F.lower("my_col"))
this returns a data frame with all the original columns, plus lowercasing the column which needs it.

NonCreature0714
- 5,744
- 10
- 30
- 52