0

I am trying to remove white spaces between quotes but not getting the right result. Could you please help me how to do this?

Example:

Local_Manufacturer|SKU_PackID_ProductNumber|Molecule_Name|BrandName_Intl
"UPJOHN                 "|"894265"|"SILDENAFIL"|"REVATIO"

Desirable output:

Local_Manufacturer|SKU_PackID_ProductNumber|Molecule_Name|BrandName_Intl
"UPJOHN"|"894265"|"SILDENAFIL"|"REVATIO"

I tried below code:

for c_name in df1.columns:
     df1 = df1.withColumn(c_name, trim(df1[c_name]))
Shivam
  • 213
  • 5
  • 14
  • Does this answer your question? [Trim string column in PySpark dataframe](https://stackoverflow.com/questions/35155821/trim-string-column-in-pyspark-dataframe) – Lamanus Aug 12 '20 at 05:52

1 Answers1

0

Import trim function.

import pyspark.sql.functions as f

for c_name in df1.columns:
     df1 = df1.withColumn(c_name, f.trim(df1[c_name]))
        
df_list = df1.collect()
print(df_list)

[Row(Local_Manufacturer='UPJOHN', SKU_PackID_ProductNumber='894265', Molecule_Name='SILDENAFIL', BrandName_Intl='REVATIO')]

The result is trimmed.

Lamanus
  • 12,898
  • 4
  • 21
  • 47