0

This:

words = words.withColumn('value_2', F.regexp_replace('value', '|'.join(stopWords), ''))

works fine for substrings.

However, I have a stop word 'a' and as a result 'was' becomes 'ws'. I only want to see it on 'A' or 'a', and leave was as is.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83

1 Answers1

1

Place word boundaries around the alternation:

words = words.withColumn('value_2', F.regexp_replace('value', '\\b(' + '|'.join(stopWords) + ')\\b', ''))
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360