0

I want to find a substring like (میں چند ممالک ایک ایسے گیا) from a paragraph but the paragraph line is not exactly same to the substring line so if more than two words are match from the line of the paragraph give that line as match line

fullstringlist =("  ادھر کی رات صرف چار گھنٹے کی ہے- جہاں دن کا دورانیہ بیس گھنٹے تک ہے- میں چند ایک ممالک ایسے گیا ")

test_list = fullstringlist.split("-")

print("The original list is : " + str(test_list))

subs_list = ['ادھر رات صرف چار گھنٹے کی ہے','میں چند ممالک ایک ایسے گیا'] 
 
res = []
for sub in test_list:
    flag = 0
    for ele in subs_list:
         
        # checking for non existence of
        # any string
        if ele not in sub:
            flag = 1
            break
    if flag == 0:
        res.append(sub)
 
# printing result
print("The extracted values : " + str(res))
 
Oghli
  • 2,200
  • 1
  • 15
  • 37
M NOUMAN
  • 13
  • 3
  • It's unclear. "the words of substring are back and forth"? It can't be understood literally due to bad English. – relent95 Nov 14 '22 at 07:42
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Nov 14 '22 at 07:54
  • sorry for english ...i want to detect a line from paragraph but the line in paragraph is not exactly the same so i want that if more than two words are match from that line give that line as match line – M NOUMAN Nov 14 '22 at 08:08

1 Answers1

1

You can achieve that using Threshold variable that indicates half number of words plus one in each substring.

Example:

ادھر رات صرف چار گھنٹے کی ہے contains 7 words so its threshold about 5 words, if we find 5 matches words or more we will consider it a match substring

fullstringlist = "  ادھر کی رات صرف چار گھنٹے کی ہے- جہاں دن کا دورانیہ بیس گھنٹے تک ہے- میں چند ایک ممالک ایسے گیا "
subs_list = ['ادھر رات صرف چار گھنٹے کی ہے','میں چند ممالک ایک ایسے گیا', 'نیکول بیکر ایک کمپنی میں کوارڈینیٹر ']


def find_matches(full_str, sub_list):
    matches = []
    for str in sub_list:
        n_words = 0
        threshold = round(len(str.split()) / 2) + 1
        words = str.split()
        for word in words:
            if full_str.find(word) != -1:
                n_words += 1
        if n_words >= threshold:
            matches.append(str)
    return matches

print(find_matches(fullstringlist, subs_list))

Output:

['ادھر رات صرف چار گھنٹے کی ہے', 'میں چند ممالک ایک ایسے گیا']

Note: you can change threshold calculation approach according to your requirements.

Oghli
  • 2,200
  • 1
  • 15
  • 37