0

So I am trying to make a small script for myself where I have one or multiply word/s and by that it is supposed to find all matching words in a randomized sentence.

etc:

Sentence1 = "Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow"

Sentence2 = "Is it beautiful weather"

Sentence3 = "I hope it wont be snowing here soon"

Sentence4 = "How is the weather"

Words = ['I+be', 'it+weather']

The output is supposed to say

Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow

Is it beautiful weather

I hope it wont be snowing here soon

and the reason why it doesn't print the first one and last one is that it does not contain I and Be and it and weather

So my question is basically how to make every + or any other special characters like keyword1 + keyword2 + n (Can be up from 1 to n words) and compare if those word are in the sentence

So what I tried to code was something like

Sentence = [
    "Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow",
    "Is it beautiful weather", "I hope it wont be snowing here soon",
    "How is the weather"]

Words = ['I', 'it+weather']

for loop_word in Words:
    for loop_setence in Sentence:
        if loop_word in loop_setence:
            print(loop_setence)
            break

However for now it does only print out the first sentence since I changed the Words to I for now.

What I want to do is that words in that contains more than 1 word should be adding with a special character in between etc I+be so whenever there is a I and Be inside a sentence it should print that it found that sentence - Else do not print anything.

Illustration

So my question for you is how can I continue from my point forward with me wish :) ?

aydow
  • 3,673
  • 2
  • 23
  • 40
Hellosiroverthere
  • 285
  • 10
  • 19
  • `'it+weather' in sentence` searches for exact this string: `'it+weather'` which isn't there. – Michael Butscher Nov 18 '18 at 21:52
  • The first sentence contains I and be, should it be in the output? – Dani Mesejo Nov 18 '18 at 21:53
  • @MichaelButscher Oh yeah that is correct, I think I need to make something that whenever there is a + in the words. it should regard it as two words I believe. but should not regard it as `it` and `weather` as two different seperated words like if I would do `['it', 'weather'] – Hellosiroverthere Nov 18 '18 at 21:56
  • @DanielMesejo Oh my bad! Yes it is supposed to be since there is a `I and Be` in that sentence. – Hellosiroverthere Nov 18 '18 at 21:57
  • If this is not a simple exercise, you should use proper NLP tools instead of hacking your way around a classical problem. – Eli Korvigo Nov 18 '18 at 22:18
  • @EliKorvigo I wish to understand what you meant :( – Hellosiroverthere Nov 18 '18 at 22:21
  • Identifying different forms/inflections of the same words is a classical and complicated problem in natural language processing (NLP) with various solutions. You should research the solution space and pick the one that suits your needs. – Eli Korvigo Nov 18 '18 at 22:27
  • @EliKorvigo It is actually first time hearing it - it sounds pretty interesting I would say! – Hellosiroverthere Nov 18 '18 at 22:29

2 Answers2

1

You could do something like this:

words = ['I+be', 'it+weather']
sentences = ["Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow",
             "Is it beautiful weather", "I hope it wont be snowing here soon", "How is the weather"]

def check_all(sentence, ws):
    return all(w in sentence for w in ws)

for sentence in sentences:
    if any(check_all(sentence, word.split('+')) for word in words):
        print(sentence)

Output

Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow
Is it beautiful weather
I hope it wont be snowing here soon

The function check_all checks if all the words from a group of words (for example 'I+be') are in the sentence. Then if for any group of words are in the sentence you should print the sentence. Note that you must first split on '+' to find if a group matches.

UPDATE

To match whole words only I suggest you use regex, for example:

import re

words = ['I+be', 'it+weather']
sentences = ["Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow",
             "Is it beautiful weather", "I hope it wont be snowing here soon", "How is the weather", "With In be"]


def check_all(sentence, ws):
    """Returns True if all the words are present in the sentence"""
    return all(re.search(r'\b{}\b'.format(w), sentence) for w in ws)


for sentence in sentences:
    if any(check_all(sentence, word.split('+')) for word in words):
        print(sentence)

Output

Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow
Is it beautiful weather
I hope it wont be snowing here soon

Note that the second example does not contains "With In be" in the output.

Further

  1. See the documentation on any and all.
  2. Python regular expression match whole word
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

Using filter, any, all, and split

In [22]: Sentence1 = "Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow"
    ...:
    ...: Sentence2 = "Is it beautiful weather"
    ...:
    ...: Sentence3 = "I hope it wont be snowing here soon"
    ...:
    ...: Sentence4 = "How is the weather"
    ...:
    ...: Words = ['I+be', 'it+weather']
    ...:

In [23]: sentences = [Sentence1, Sentence2, Sentence3, Sentence4]

In [27]: list(filter(lambda s: any(all(w in s.split() for w in word.split('+')) for word in Words), sentences))
    ...:
Out[27]:
['Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow',
 'Is it beautiful weather',
 'I hope it wont be snowing here soon']

The comprehension returns a generator of True of False if one of the keywords are in one of the sentences. all will return True if all elements of the inner container are True. Conversely, any will return True if any elements of the inner container are True.

Checking the 'be' doesn't return Sentence2

In [43]: Words = ['be']

In [44]: list(filter(lambda s: any(all(w in s.split() for w in word.split('+')) for word in Words), sentences))
Out[44]:
['Hello, I am new here and I hope I will be able to help and get helped from Stackoverflow',
 'I hope it wont be snowing here soon']

Note that this won't take into account punctuation. I.e. 'Hello' != 'Hello,'

aydow
  • 3,673
  • 2
  • 23
  • 40
  • Hello! I did find an issue in this program, so basically whenever if you change your Words into just one word in etc `i` it will take all i that contains in any words meaning like beatiful it will print it out since it contains `i` inside it – Hellosiroverthere Nov 18 '18 at 22:20
  • so you are asking for whole word matches? – aydow Nov 18 '18 at 22:23
  • Yes exactly :) It might be confusion when it comes with the `+` but basically what it does is that it checks if there is etc I and be in the word which I assume the one you coded was similar but it looks only for anything that has a I and Be but what I wish is that it should work for the whole word matches and not just each characters :) – Hellosiroverthere Nov 18 '18 at 22:25
  • This might seem like this is the solution. However whenever I try to print out this function by doing `print(list(filter(lambda s: any(all(w in s.split() for w in word.split('+')) for word in Words), sentences)))` it gives me the answers within a list. Is it possible that it just prints out without inside a list `[Is it beaitful weather]` ? – Hellosiroverthere Nov 18 '18 at 22:43
  • no, you can't print out a group of things without the grouping mechanism – aydow Nov 18 '18 at 23:39