4

I'm working on a function that, given a sequence, tries to find said sequence within a list and should then return the list item immediately after that sequence terminates.

Currently this code does return the list item immediately after the end of the sequence, however I'm not to happy with having this many nested if-statements and would love to rewrite it but I can't figure out how to go about it as it is quite unlike anything I've ever written in the past and feel a bit out of practice.

def sequence_in_list(seq, lst):
    m, n = len(lst), len(seq)
    for i in xrange(m):
        for j in xrange(n):
            if lst[i] == seq[j]:
                if lst[i+1] == seq[j+1]:
                    if lst[i+2] == seq[j+2]:
                        return lst[i+3]

(My intention is to then extend this function so that if that sequence occurs more than once throughout the list it should return the subsequent item that has happened the most often after the sequence)

  • Is `seq` a string or a list in your case? – rkrzr Jul 19 '13 at 08:38
  • if you have no else statment you always can "merge" if statements: `list[i] == seq[j] and list[i+1] == seq[j+1] and list[i+2] == seq[j+2]` – noisy Jul 19 '13 at 08:40
  • Do you want to get all the elements that come after all the occurrences of seq in `list`? And please don't use built-ins as your variable name. – Rohit Jain Jul 19 '13 at 08:42
  • You're basically creating a [string searching algorithm](http://en.wikipedia.org/wiki/String_searching_algorithm) for lists. – Blender Jul 19 '13 at 08:43
  • @noisy Sure could, but it would make for quite a long line. And I cannot be certain beyond any reasonable doubt that I will always look for only a 3-step match. –  Jul 19 '13 at 08:50
  • @RohitJain Thanks I'll fix the variable name immediately. I want to be able to check those elements (all the elements after each occurance of seq) and return the one that has occurred the most. –  Jul 19 '13 at 08:52
  • @Blender Thanks for the link, I'll read up on it and see if I can garner any wisdom that will help me on my trails. –  Jul 19 '13 at 08:57

3 Answers3

2

I would do this with a generator and slicing:

sequence = [1, 2, 3, 5, 1, 2, 3, 6, 1, 2, 3]
pattern = [1, 2, 3]

def find_item_after_pattern(sequence, pattern):
    n = len(pattern)

    for index in range(0, len(sequence) - n):
        if pattern == sequence[index:index + n]:
            yield sequence[index + n]

for item in find_item_after_pattern(sequence, pattern):
    print(item)

And you'll get:

5
6

The function isn't too efficient and won't work for infinite sequences, but it's short and generic.

Blender
  • 289,723
  • 53
  • 439
  • 496
  • Thanks, since Bakuriu's answer solves the immediate problem at hand I consider it just to view that as the accepted answer. Your solution does however provide the framework I need to continue my work, and for that I am most grateful! –  Jul 19 '13 at 09:10
1

Since you are comparing consecutive indexes, and assuming lst and seq are of the same type, you can use slicing:

def sequence_in_list(seq, lst):
    m, n = len(lst), len(seq)
    for i in xrange(m):
        for j in xrange(n):
            if lst[i:i+3] == seq[j:j+3]:
                return lst[i+3]

If the sequences are of different kind you should convert to a common type before doing the comparison(e.g. lst[i:i+3] == list(seq[j:j+3]) would work if seq is a string and lst is a list).

Alternatively, if the sequences do not support slicing, you can use the built-in all to check for more conditions:

def sequence_in_list(seq, lst):
    m, n = len(lst), len(seq)
    for i in xrange(m):
        for j in xrange(n):
            if all(lst[i+k] == seq[j+k] for k in range(3)):
                return lst[i+3]

If you want to extend the check over 10 indices instead of 3, simply change range(3) to range(10).

Side note: your original code would raise an IndexError at some point, since you access list[i+1] where i may be len(list) - 1. The above code doesn't produce any errors, since slicing may produce a slice shorter than the difference of the indices, meainig that seq[j:j+3] can have less than 3 elements. If this is a problem you should adjust the indexes on which you are iterating over.

Last remark: don't use the name list since it shadows a built-in name.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • Nice catch on the IndexError, I left that out (maybe that was wrong of me) because at this stage as I'm moving forward I know what the function will parse. Later on I would have resolved that. Both arguments are list, so your first solution worked beautifully and so did the other one, mind. Thanks! –  Jul 19 '13 at 09:00
  • Why are you iterating over `xrange(n)` and then slicing? – Blender Jul 19 '13 at 09:12
  • @Blender I think he focused on managing the nested if-statements as per request. The reason it's in my original code is a glorious mix of trying things I don't necessarily understand and writing into the wee hours of the night. The code runs just fine without it given that all instances of `j` is replaced with `i` –  Jul 19 '13 at 09:17
0

You can combine list comprehension with slicing to make comparing more readable:

n, m = len(lst), len(seq)
[lst[j+3] for i in range(m-2) for j in range(n-2) if seq[i:i+3] == lst[j:j+3]]

Of course there are more efficient ways to do it, but this is simple, short and python styled.

pkacprzak
  • 5,537
  • 1
  • 17
  • 37