I'm using PySpark to apply a function to get the cell value, split by ' ' and get first and last index of the split, but this column contains null values and I'm not managing to handle this null before split.
Here is my code:
def get_name(full_name):
for i in full_name:
if i is not None:
name_list = full_name.split(' ')
#first and last item of list
return f"{name_list[0]} {name_list[-1]}"
else:
return full_name
udf_get_name = udf(lambda x: get_name(x), StringType())
df_parquet = df_parquet.withColumn("NameReduz", udf_get_name(col("FullName")))
It complains about the NoneType
This is what I'm expecting:
FullName | NameReduz |
---|---|
NAME SURNAME LAST | NAME LAST |
NAME SURNAME1 SURNAME2 LAST | NAME LAST |
null | null |