How to remove white spaces between the quotes in pyspark dataframe?

Question

I am trying to remove white spaces between quotes but not getting the right result. Could you please help me how to do this?

Example:

Local_Manufacturer|SKU_PackID_ProductNumber|Molecule_Name|BrandName_Intl
"UPJOHN                 "|"894265"|"SILDENAFIL"|"REVATIO"

Desirable output:

Local_Manufacturer|SKU_PackID_ProductNumber|Molecule_Name|BrandName_Intl
"UPJOHN"|"894265"|"SILDENAFIL"|"REVATIO"

I tried below code:

for c_name in df1.columns:
     df1 = df1.withColumn(c_name, trim(df1[c_name]))

Does this answer your question? [Trim string column in PySpark dataframe](https://stackoverflow.com/questions/35155821/trim-string-column-in-pyspark-dataframe) — Lamanus, Aug 12 '20 at 05:52

score 0 · Accepted Answer · answered Aug 12 '20 at 05:49

Import trim function.

import pyspark.sql.functions as f

for c_name in df1.columns:
     df1 = df1.withColumn(c_name, f.trim(df1[c_name]))
        
df_list = df1.collect()
print(df_list)

[Row(Local_Manufacturer='UPJOHN', SKU_PackID_ProductNumber='894265', Molecule_Name='SILDENAFIL', BrandName_Intl='REVATIO')]

The result is trimmed.

How to remove white spaces between the quotes in pyspark dataframe?

1 Answers1