How to match a string in a sentence

Question

I want to check whether a particular string is present in a sentence. I am using simple code for this purpose

subStr = 'joker'
Sent = 'Hello World I am Joker'

if subStr.lower() in Sent.lower():
    print('found')

This is an easy straightforward approach, but it fails when sentence appears as

hello world I am Jo ker

hello world I am J oker

As I am parsing sentence from a PDF file some unnecessary spaces are coming here and there.

A simple approach to tackle this issue would be to remove all the spaces from a sentence and look for a substring match. I want to know other peoples thoughts on this, should I stick with this approach or look for some other alternatives.

How would differentiate between "to day" and "today" if your input has arbitrary spacing? — jpp, Feb 06 '18 at 13:48

score 2 · Answer 1 · answered Feb 06 '18 at 13:52

2

you can use regular expression:

import re
word_pattern = re.compile(r'j\s*o\s*k\s*e\s*r', re.I)
sent = 'Hello World I am Joker'
if word_pattern.search(sent):
    print('found')

I hope this works

answered Feb 06 '18 at 13:52

Tzomas

704
5
17

jpp · Accepted Answer · 2018-02-07T09:10:42.603

2

This is more efficient than replace for small strings, more expensive for large strings. It won't deal with ambiguous cases, e.g. 'to day' vs 'today'.

subStr in ''.join(Sent.split()).lower()  # True

edited Feb 07 '18 at 09:10

answered Feb 06 '18 at 13:57

jpp

159,742
34
281
339

A quick test suggests this is about half as fast as using `subStr Sent.replace(' ', '').lower()` - a bit better if Sent is very short, a bit worse if it's very long. – Nathan Vērzemnieks Feb 07 '18 at 03:12
@NathanVērzemnieks, I will update my answer. My test was for small strings, as per OP. – jpp Feb 07 '18 at 09:10

score 0 · Answer 3 · answered Feb 06 '18 at 13:49

Try this. This may break somewhere unexpectedly. But for your use case this might work

In [1]: Sent = 'Hello World I am Joker'

In [3]: subStr = 'Joker'

In [4]: if subStr in Sent.replace(' ', ''):
   ...:     print("Do something")
   ...:     
Do something

score 0 · Answer 4 · answered Feb 07 '18 at 04:02

Your proposed approach - removing spaces - seems straightforward and efficient (two to ten times faster than the other suggestions, in some simple tests). If you need to minimize false positives, though, you might be better off with the regular expression approach. You could add word boundaries to avoid partial word matches, and examine the matching substring to see if any spaces could be real spaces, perhaps by matching against a canonical word list.

>>> sentence = 'Were the fields ever green? - they were never green.'
>>> target = 'evergreen'
>>> pattern = re.compile(r'\b' + '\s*'.join(target) + r'\b')
>>> pattern.findall(sentence) # only one match because of \b
['ever green']
>>> matching_words = pattern.findall(sentence)[0].split()
>>> all(word in dictionary for word in matching_words)
True

How to match a string in a sentence

4 Answers4