0

How to do a partial check with string or array in Python?

I have a string like this

"doamins" and it should partially match with "domainsid"

I tried with few examples I will explain what I need

eg 1: - > This is not fine as I need "domain" which should be a partial match

"domains" in "domainid"
False

eg 2: - > This works as expected (but there is problem at example at eg 3)

"domains" in "domainsid"
True

eg 3: - > This is not fine as "d" shud not match (but instead "domain shud be matched")

>>> "d" in "domainsid"
True
>>> "d" in "domainid"
True
Ilya
  • 4,583
  • 4
  • 26
  • 51
  • 1
    'in' not work... pattern.match(^('domains')$)... if match.groups() return True – Lokesh Sanapalli Aug 12 '16 at 05:30
  • In third example, why should `domain` be matched when what you are trying is to match `d`? `d` is also a valid partial string. – skjoshi Aug 12 '16 at 05:35
  • is there any other where "d" wont get matched ? – suresh kumar Aug 12 '16 at 05:37
  • I would advise you to first understand how `in` operator works. – akash karothiya Aug 12 '16 at 05:37
  • 1
    What criteria are you using for a *partial match*? Is it the longest sequence that is the same in both?? – wwii Aug 12 '16 at 05:39
  • You should make that clear in your question. – wwii Aug 12 '16 at 05:47
  • 1
    [Longest Common Substring Problem](https://en.wikipedia.org/wiki/Longest_common_substring_problem) – wwii Aug 12 '16 at 06:58
  • Looking for a possible duplicate (there are a lot of them) I ran across a couple of answers using [```difflib.SequenceMatcher```](https://docs.python.org/3/library/difflib.html#sequencematcher-objects) which is in the Standard Library - that's probably what you should use. – wwii Aug 12 '16 at 07:30
  • Possible duplicate of [How to find the overlap between 2 sequences, and return it](http://stackoverflow.com/questions/14128763/how-to-find-the-overlap-between-2-sequences-and-return-it) – wwii Aug 12 '16 at 07:30

1 Answers1

1

First a helper function adapted from the itertools pairwise recipe to produce substrings.

import itertools
def n_wise(iterable, n = 2):
    '''n = 2 -> (s0,s1), (s1,s2), (s2, s3), ...

    n = 3 -> (s0,s1, s2), (s1,s2, s3), (s2, s3, s4), ...'''
    a = itertools.tee(iterable, n)
    for x, thing in enumerate(a[1:]):
        for _ in range(x+1):
            next(thing, None)
    return zip(*a)

Then a function the iterates over substrings, longest first, and tests for membership.

def foo(s1, s2):
    '''Finds the longest matching substring
    '''
    # the longest matching substring can only be as long as the shortest string
    #which string is shortest?
    shortest, longest = sorted([s1, s2], key = len)
    #iterate over substrings, longest substrings first
    for n in range(len(shortest)+1, 2, -1):
        for sub in n_wise(shortest, n):
            sub = ''.join(sub)
            if sub in longest:
                #return the first one found, it should be the longest
                return sub

s = "fdomainster"
t = "exdomainid"
print(foo(s,t))
wwii
  • 23,232
  • 7
  • 37
  • 77