Find Similar Elements in List using Python

Question

I need to look for similar Items in a list using python. (e.g. 'Limits' is similar to 'Limit' or 'Download ICD file' is similar to 'Download ICD zip file') I really want my results to be similar with chars, not with digits (e.g. 'Angle 1' is similar to 'Angle 2'). All these strings in my list end with an '\0'

What I am trying to do is split every item at blanks and look if any part consists of a digit. But somehow it is not working as I want it to work.

Here is my code example:

for k in range(len(split)):  # split already consists of splitted list entry
    replace = split[k].replace(
        "\\0", ""
    )  # replace \0 at every line ending to guarantee it is only a digit
    is_num = lambda q: q.replace(
        ".", "", 1
    ).isdigit()  # lambda i found somewhere on the internet
    check = is_num(replace)
    if check == True:  # break if it is a digit and split next entry of list
        break
    elif check == False:  # i know, else would be fine too
        seq = difflib.SequenceMatcher(a=List[i].lower(), b=List[j].lower())
        if seq.ratio() > 0.9:
            print(Element1, "is similar to", Element2, "\t")
            break

How are you defining similar?.. do you wish to only check if one has an 's' in the end and compare, or if 2 words contain even 1 same letter will they be considered similar? — rishi, Aug 10 '20 at 09:24
A bit difficult to answer, As string can have different length its not easy to say about number of matching chars. They need to be visibly similar — doublesobig, Aug 10 '20 at 09:54
You need to have some idea in your mind before trying to program something, I suggest you first think about what kind of similarity you require for your program and then write a code for it. Having a general idea of human understanding is not enough for the computer, it will require some logic. Just to be clear, think about what you require don't start thinking about the code before you have your requirements set. — rishi, Aug 10 '20 at 09:57

score 0 · Answer 1 · answered Aug 10 '20 at 10:54

0

Try this, its using get_close_matches from difflib instead of sequencematcher.

from difflib import get_close_matches
a = ["abc/0", "efg/0", "bc/0"]
b=[]
for i in a:
    x = i.rstrip("/0")
    b.append(x)

for i in range(len(b)):
        print(get_close_matches(b[i], (b)))

answered Aug 10 '20 at 10:54

rishi

643
5
21

@doublesobig, this is just a simple for loop to check every element, so you will get a repeat, make sure to edit it to your requirements. – rishi Aug 10 '20 at 10:56

Find Similar Elements in List using Python

1 Answers1