0

Im trying to replace a matched string only - and nothing else within the column, with another value.

For example:

My name is GaryBrooks. 
The Partnertime series was good.

Match:

GaryBrooks
Partner time

Expected output:

My name is [TM="GaryBrooks"].
The [TM="Partner time"] series was good.

So far, ive done the following;

| trademarkname | tm_value | DESCRIPTION_TEXT |Compare|
------------------------------------------------------------
| GaryBrooks  | [TM="GaryBrooks"]| My name is GaryBrooks. |yes
| Partner time| [TM="Partner time"] |The Partnertime series was good.|yes

file['Compare'] = file.apply(lambda x: 'Yes' if x['trademarkname'] in x['DESCRIPTION_TEXT'] else 'No',axis=1)

I was successful until the match was found but not yet in replacing it. Im not sure if this is a regexp replace function or a for loop

Something like this is what I wanna do or think: WHEN "Compare" IS 'Yes' THEN regexp_replace("DESCRIPTION_TEXT", "trademarkname" (This is what has to be matched, "tm_value" (*this is what the string should be replaced with)

Lamanus
  • 12,898
  • 4
  • 21
  • 47
Sid
  • 23
  • 6

1 Answers1

1

Try with expr in withColumn and we are going to replace the matched value with tm_value data.

Example:

from pyspark.sql.functions import *
df = spark.createDataFrame([('GaryBrooks','[TM="GaryBrooks"]','My name is GaryBrooks.','yes'),('Partner time','[TM="Partner time"]','The Partnertime series was good.','yes')],['trademarkname','tm_value','DESCRIPTION_TEXT','Compare'])
df.withColumn("output", expr('regexp_replace(DESCRIPTION_TEXT,"(GaryBrooks|Partnertime)",tm_value)')).\
show(10,False)
#+-------------+-------------------+--------------------------------+-------+----------------------------------------+
#|trademarkname|tm_value           |DESCRIPTION_TEXT                |Compare|output                                  |
#+-------------+-------------------+--------------------------------+-------+----------------------------------------+
#|GaryBrooks   |[TM="GaryBrooks"]  |My name is GaryBrooks.          |yes    |My name is [TM="GaryBrooks"].           |
#|Partner time |[TM="Partner time"]|The Partnertime series was good.|yes    |The [TM="Partner time"] series was good.|
#+-------------+-------------------+--------------------------------+-------+----------------------------------------+
notNull
  • 30,258
  • 4
  • 35
  • 50
  • What if there are more than a few tm values and trademark names, how would I do this? a dictionary? – Sid Jul 11 '23 at 17:10
  • you need to create a regex matching string and use it in the `regexp_replace` function. – notNull Jul 11 '23 at 17:26
  • if the answer helped you to solve the issue, take a moment to accept and upvote to close this thread as solved.! – notNull Jul 28 '23 at 13:48