How to check how many characters a variable has in common with another variable

Question

If I have two variables, and I want to see how many characters they have in common, what would I do to reach a number of how many were wrong? for example:

a = "word"
b = "wind"
a - b = 2

is there a way to do this or to make what is above work?

Edit: it should also take into account order when calculating

Edit2: All these should turn out as shown bellow

a = bird
b = word
<program to find answer> 2


a = book
b = look
<program to find answer> 3


a = boat
b = obee
<program to find answer> 0

a = fizz
b = faze
<program to find answer> 2

Do you want the number of unique characters that do not appear? Or just the number of differences. — BernardL, Dec 03 '18 at 18:26
So are you looking for the Hamming distance between the two strings? — lxop, Dec 03 '18 at 18:27
@BernardL if `a = wood`, and `b = word`, the answer should be `1`. does that help? — Bearded Pancake, Dec 03 '18 at 18:29
@slider then the answer should be `2`, it's also checking order — Bearded Pancake, Dec 03 '18 at 18:31
it sounds like you're looking for https://en.wikipedia.org/wiki/Levenshtein_distance — Aaron, Dec 03 '18 at 18:31

BernardL · Accepted Answer · 2018-12-03T18:54:27.163

This might not apply for all cases, but if you would like to compare characters you can use set:

a = "word"
b = "wind"

diff = set.intersection(set(a),set(b))
print(len(diff))
>> 2

This ignores sequences as you are grouping them into a set of unique characters.

Another interesting Python standard module library you can use is difflib.

from difflib import Differ

d = Differ()

a = "word"
b = "wind"

[i for i in d.compare(a,b) if i.startswith('-')]
>>['- o', '- r']

difflib essentially provides methods for you to compare sequences such as strings. From the Differ object above, you can compare 2 strings and identify characters which are added or removed to track changes from the string a to string b. In the example given, a list comprehension is used to filter out characters which are removed from a to b, you can also check characters that start with + for characters that are added.

[i for i in d.compare(a,b) if i.startswith('+')]
>>['+ i', '+ n']

Or characters common to both sequence addressing

How to check how many characters a variable has in common with another variable

common = [i for i in d.compare(a,b) if i.startswith('  ')]
print(common, len(common))
>> ['  w', '  d'] 2

You can read more about the Differ object here

lxop · Answer 2 · 2018-12-03T18:36:16.237

2

You could do something like this:

sum(achar != bchar for achar, bchar in zip(a,b))

Which will work where the strings have the same length. If they might have different lengths, then you can also account for that:

sum(achar != bchar for achar, bchar in zip(a,b)) + abs(len(a) - len(b))

Though that will only allow the words to match at their beginning, so that the difference between wordy and word will be 1, while the difference between wordy and ordy will be 5. If you want that difference to be 1, then you'll need a little more complex logic.

edited Dec 03 '18 at 18:36

answered Dec 03 '18 at 18:30

lxop

7,596
3
27
42

This is very close to what I'm looking for, unfortunately, it doesn't work the majority of the time, either giving the max or the minimum possible answer – Bearded Pancake Dec 03 '18 at 20:51

Aaron · Answer 3 · 2018-12-04T17:00:07.750

What you have described needing is an edit distance metric between words. Hamming distance was mentioned, however it cannot correctly account for words of different length as it only accounts for substitution. Other common metrics include "longest common substring", "Levenshtein distance", "Jaro distance", etc..

Your question seems to describe Levenshtein distance, which is defined by the minimum number of single character edits to reach one word from another (insertions, deletions, or substitutions). The wikipedia page for this is pretty thorough if you'd like to read and understand more on the topic (or go on a wikipedia tangent), but as far as coding, there already exists a library on pip: pip install python-Levenshtein that implements the algorithm in c for faster execution.

Example:

Here's the recursive implementaion from rosetta code with a bunch of comments to help you understand how it works..

from functools import lru_cache
@lru_cache(maxsize=4095) #recursive approach will calculate some substrings many times, 
                         # so we can cache the result and re-use it to speed things up.
def ld(s, t):
    if not s: return len(t) #if one of the substrings is empty, we've reached our maximum recursion
    if not t: return len(s) # the difference in length must be added to edit distance (insert that many chars.)

    if s[0] == t[0]: #equal chars do not increase edit distance
        return ld(s[1:], t[1:]) #remove chars that are the same and find distance
    else: #we must edit next char so we'll try insertion deletion and swapping
        l1 = ld(s, t[1:]) #insert char (delete from `t`)
        l2 = ld(s[1:], t) #delete char (insert to `t`)
        l3 = ld(s[1:], t[1:]) #swap chars
        #take minimum distance of the three cases we tried and add 1 for this edit
        return 1 + min(l1, l2, l3)

and testing it out:

>>>ld('kitten', 'sitting') #swap k->s, swap e->i, insert g
Out[3]: 3

Jakub Orsula · Answer 4 · 2018-12-03T22:28:43.500

Count common chars and substract that from length of longer string. Based on your edit and comments I think you are looking for this:

def find_uncommon_chars(word1, word2):
    # select shorter and longer word
    shorter = word1
    longer = word2
    if len(shorter) > len(longer):
        shorter = word2
        longer = word1

    # count common chars
    count = 0
    for i in range(len(shorter)):
        if shorter[i] == longer[i]:
            count += 1
    # if you return just count you have number of common chars
    return len(longer) - count

How to check how many characters a variable has in common with another variable

4 Answers4