-1

I have a list of words:

list1 = ['technology','technician','technical','technicality']

I want to check which phrase is repeated in each of the word. In this case, it is 'tech'. I have tried converting all the characters to ascii values, but I am stuck there as I am unable to think of any logic. Can somebody please help me with this?

Matt N.
  • 53
  • 1
  • 10
  • 1
    ["Longest common substring/subsequence."](https://www.google.com/search?client=firefox-b-d&q=Longest+common+substring) – Mateen Ulhaq Jan 22 '22 at 08:05
  • 2
    Whenever I get stuck on algorithm problems, I find it useful to work out a few examples by hand until a get a feel for a solution - any solution - and then I optimize as necessary. If you had to solve this by yourself, with a pencil and paper, how would you do it? Can you code up that strategy? – mackorone Jan 22 '22 at 08:05
  • 3
    Wouldn't the longest common substring be `'techn'` in this case? Unless you're limiting it to a list of "valid" words. – Chris Jan 22 '22 at 08:24
  • @mackorone I am very new to programming and this idea helped me a lot for debugging. Thanks a lot for suggestion! – Abhishek Pagare Jan 28 '22 at 16:49
  • @Chris yeah that's correct. It should be 'techn' – Abhishek Pagare Jan 28 '22 at 16:50

3 Answers3

0

This is generally called the Longest common substring/subsequence problem.


A very basic (but slow) strategy:

longest_substring = ""
curr_substring = ""

# Loop over a particular word (ideally, shortest).
for start_idx in range(shortest_word):

    # Select a substring from that word.
    for length in range(1, len(shortest_word) - start_idx):
        curr_substring = shortest_word[start_idx : start_idx + length]

        # Check if substring is present in all words,
        # and exit loop or update depending on outcome.

        if "curr_substring not in all words":
            break

        if "new string is longer":
            longest_substring = curr_substring
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
0

Iterate over first word, increase length of prefix if there is only one prefix in all words checked by set, when difference in prefix is found return last result

list1 = ['technology', 'technician', 'technical', 'technicality']


def common_prefix(li):
    s = set()
    word = li[0]
    while(len(s) < 2):
        old_s = s
        for i in range(1, len(word)):
            s.add(word[:i])
    return old_s.pop()


print(common_prefix(list1))

output: techn

Tomáš Šturm
  • 489
  • 4
  • 8
0

Find the shortest word. Iterate over increasingly small chunks of the first word, starting with a chunk equal in length to the shortest word, checking that each is contained in all of the other strings. If it is, return that substring.

list1 = ['technology', 'technician', 'technical', 'technicality']

def shortest_common_substring(lst):
    shortest_len = min(map(len, lst))
    shortest_word = next((w for w in lst if len(w) == shortest_len), None)
    
    for i in range(shortest_len, 1, -1):
        for j in range(0, shortest_len - i):
            substr = lst[0][j:i]
            
            if all(substr in w for w in lst[1:]):
                return substr

And just for fun, let's replace that loop with a generator expression, and just take the first thing it gives us (or None).

def shortest_common_substring(lst):
    shortest_len = min(map(len, lst))
    shortest_word = next((w for w in lst if len(w) == shortest_len), 0)
    
    return next((lst[0][j:i] for i in range(shortest_len, 1, -1)
                             for j in range(0, shortest_len - i)
                             if all(lst[0][j:i] in w for w in lst[1:])),
                None)
Chris
  • 26,361
  • 5
  • 21
  • 42