2

I know this question has already been answered before here Case insensitive replace but mine is a little different.

What I want is to search for certain keywords in a text and replace by surrounding them with <b> and </b>. And there are four different possibilities explained through an example below:

Keywords = ['hell', 'world']

Input Sentence = 'Hell is a wonderful place to say hello and sell shells'

Expected Output 1 = '<b>Hell</b> is a wonderful place to say hello and sell shells' -- (not replaced by keyword 'hell' but the found word 'Hell'. Only complete matches replaced.)

Expected Output 2 = '<b>Hell</b> is a wonderful place to say <b>hello</b> and sell shells' -- (only the matching words beginning with the keyword are replaced. Note that the whole word is getting replaced even if the match is partial)

Expected Output 3 = '<b>Hell</b> is a wonderful place to say <b>hello</b> and sell <b>shells</b>' -- (Any occurrence of hell is replaced but by the complete matching word)

Expected Output 4 = '<b>Hell</b> is a wonderful place to say <b>hell</b>o and sell s<b>hell</b>s' -- (Any occurrence of hell is replaced but NOT by the complete matching word. The casing of the matching word is left intact)

The linked SO question, replaces the word by the found keyword which is not what I want. I want to keep the casing of the input sentence intact. Can someone please help me find solution to all the above four cases?

The code that I have tried:

import re
insensitive_hippo = re.compile(re.escape('hell'), re.IGNORECASE)
insensitive_hippo.sub('hell', 'Hell is a wonderful place to say hello and sell shells')
'hell is a wonderful place to say hello and sell shells'

But this doesn't keep the found word intact.

Community
  • 1
  • 1
The Wanderer
  • 3,051
  • 6
  • 29
  • 53
  • What's your expected output? – Avinash Raj Jun 24 '15 at 09:28
  • Given the input sentence and keyword list, these four types of translated text is the expected output – The Wanderer Jun 24 '15 at 09:29
  • you want four for a single input sentence? You must need to show your attempts.. – Avinash Raj Jun 24 '15 at 09:30
  • @AvinashRaj I am using the method discuss in the linked SO post. – The Wanderer Jun 24 '15 at 09:32
  • looks like the keword ought to be "hell", not hello ;) – Henrik Jun 24 '15 at 09:33
  • Yes you are right. My bad !!! – The Wanderer Jun 24 '15 at 09:34
  • So you want 4 different regexes right? – vks Jun 24 '15 at 09:35
  • @vks Yes I want four regex but if you think that is asking for too much then if you can help me find a regex that replaces the found word with itself surrounded by then that would be great. Once that's done I can play around with that regex to find regex for all four options.. – The Wanderer Jun 24 '15 at 09:36
  • So where is your attempt, and what precisely is the problem with it? Why don't you use something like http://regex101.com/#python to test your regex? – jonrsharpe Jun 24 '15 at 09:39
  • Why has this been downvoted? I am using the method described in the linked post and it is replacing the "keyword" and not the found word again. – The Wanderer Jun 24 '15 at 09:42
  • Please read [ask]. Your question includes neither the code you have written nor a concise description of the problem with it. – jonrsharpe Jun 24 '15 at 10:07
  • @jonrsharpe With all due respect, I would disagree. Not only have I provided a sample example with a given input and expected outputs (concise description) but also a link to the post whose code I have been trying. vks was able to help me in a couple of minutes based on this but all other folks in this comment list have only been making my life difficult and just acting smart without actually helping me (not to offend anyone but that's what I feel). – The Wanderer Jun 24 '15 at 11:13
  • provided the code that I have been trying. – The Wanderer Jun 24 '15 at 11:15
  • Questions should stand alone. Your question provides the inputs and expected outputs, but **not** the actual outputs, and **not** the code you're actually using. If you were just using the code from the linked question verbatim, then *why?!* Had you made *no effort* to adapt it to your needs? The fact that the question is answerable **does not** make it a good one, nor on-topic (indeed, the fact that it is so trivially answerable suggests that you're just being lazy). This is neither a code-writing nor tutorial service, and you should not treat it as such. – jonrsharpe Jun 24 '15 at 11:15

1 Answers1

2
print re.sub(r"\b(hell)\b",r"<b>\1</b>",x,flags=re.I)

print re.sub(r"\b(hell\S*)",r"<b>\1</b>",x,flags=re.I)

print re.sub(r"\b(\S*hell\S*)",r"<b>\1</b>",x,flags=re.I)

print re.sub(r"(hell)",r"<b>\1</b>",x,flags=re.I)

Output:

<b>Hell</b> is a wonderful place to say hello and sell shells
<b>Hell</b> is a wonderful place to say <b>hello</b> and sell shells
<b>Hell</b> is a wonderful place to say <b>hello</b> and sell <b>shells</b>
<b>Hell</b> is a wonderful place to say <b>hell</b>o and sell s<b>hell</b>s
vks
  • 67,027
  • 10
  • 91
  • 124