I have a Spark dataframe:
id | objects |
---|---|
1 | [sun, solar system, mars, milky way] |
2 | [moon, cosmic rays, orion nebula] |
I need to replace space with underscore in array elements.
Expected result:
id | objects | concat_obj |
---|---|---|
1 | [sun, solar system, mars, milky way] | [sun, solar_system, mars, milky_way] |
2 | [moon, cosmic rays, orion nebula] | [moon, cosmic_rays, orion_nebula] |
I tried using regexp_replace
:
df = df.withColumn('concat_obj', regexp_replace('objects', ' ', '_'))
but that changed all spaces to underscores while I need to replace spaces only inside array elements.
So, how can this be done in PySpark?