9

I was wondering what would be a a fairly 'common' or normal way of doing this. Wasn't really looking for the shortest possible answer like a 2-liner or anything. I've just quickly put this piece of code together but I can't not feel like there's way too much in there. Also if there are any libraries that could help with this, that would be very nice.

def get_cycle(line):
    nums = line.strip().split(' ')

    # 2 main loops, for x and y
    for x in range(2, len(nums)): # (starts at 2, assuming the sequence requires at least 2 members)
        for y in range(0, x):
            # if x is already in numbers before it
            if nums[x] == nums[y]:
                seq = [nums[x]] # (re)start the sequence
                adder = 1       # (re)set the adder to 1
                ok = True       # (re)set ok to be True
                # while the sequence still matches (is ok) and
                # tail of y hasn't reached start of x
                while ok and y + adder < x:
                    if nums[x + adder] == nums[y + adder]:  # if next y and x match
                        seq.append(nums[x + adder])         # add the number to sequence
                        adder += 1                          # increase adder
                    else:
                        ok = False                          # else the sequence is broken
                # if the sequence wasn't broken and has at least 2 members
                if ok and len(seq) > 1:
                    print(' '.join(seq))    # print it out, separated by an empty space
                    return
Matthijs990
  • 637
  • 3
  • 26
SuperLemon
  • 751
  • 1
  • 7
  • 23
  • Please try to describe in words what all of this is supposed to do. It's pretty dense. – Fred Foo Dec 29 '11 at 20:13
  • 2
    If this is working properly, it may be a better question for http://codereview.stackexchange.com/ – Adam Wagner Dec 29 '11 at 20:15
  • sorry for the density. It reads a sequence of numbers eg. '3 0 5 5 1 5 1 6 8' and has to find the first sequence of numbers that repeat , in this case the '5 1 5 1', and print out that single sequence ('5 1'). EDIT: also yeah, this works, but I guess there has to be a better way ----input text file: 2 0 6 3 1 6 3 1 6 3 1 ---- output 6 3 1------ – SuperLemon Dec 29 '11 at 20:20
  • why does it not print ('5')? should it print the _longest_ sequence? only sequences of length > 2? – Niklas B. Dec 29 '11 at 20:22
  • yeah - # if the sequence wasn't broken and has at least 2 members ----if ok and len(seq) > 1:---- has to be atleast 2 members... I guess 1 number isnt really a sequence – SuperLemon Dec 29 '11 at 20:23
  • @falconvk - Thanks for the clarification, I edited my answer so it should now do what you want. – Andrew Clark Dec 29 '11 at 20:32

2 Answers2

22

I may not be properly understanding this, but I think there is a very simple solution with regex.

(.+ .+)( \1)+

Here is an example:

>>> regex = re.compile(r'(.+ .+)( \1)+')
>>> match = regex.search('3 0 5 5 1 5 1 6 8')
>>> match.group(0)    # entire match
'5 1 5 1'
>>> match.group(1)    # repeating portion
'5 1'
>>> match.start()     # start index of repeating portion
6

>>> match = regex.search('2 0 6 3 1 6 3 1 6 3 1')
>>> match.group(1)
'6 3 1'

Here is how it works, (.+ .+) will match at least two numbers (as many as possible) and place the result into capture group 1. ( \1)+ will match a space followed by the contents of capture group 1, at least once.

And an extended explanation for the string '3 0 5 5 1 5 1 6 8':

  • (.+ .+) will originally match the entire string, but will give up characters off the end because ( \1)+ will fail, this backtracking will occur until (.+ .+) cannot match at the beginning of the string at which point the regex engine will move forward in the string and try again
  • This will happen until the capture group starts at the second 5, it will give up characters at the end until '5 1' is captured, at which point the regex is looking for any number of ' 5 1' for ( \1)+, it will of course find this and the match will succeed
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • It doesn't have to be the longest sequence. It just has to be a repeating cycle, meaning something like 1 3 6 5 1 3 6, doesn't have one, because theres a 5 in between. Needs to find the 1st one happening, not the longest one – SuperLemon Dec 29 '11 at 20:30
  • really looking forward to seeing the explanation. I've been doing python for about a week now, and am really new to the whole regex thing. Guess I should start looking much more at it. – SuperLemon Dec 29 '11 at 20:38
  • Thanks. Now I just need to really look into it and see what each part of it does and why. Never even used the compile() so.. Much appreciated! – SuperLemon Dec 29 '11 at 20:42
  • `(.+ .+)` I thought this will always find exactly 2 groups of 1 or more characters, groups being separated by a single space char. and then the `( \1)+` part will match any number of those 2-piece groups following it, which is why I'm still confused about how that matches `6 3 1 6 3 1` – SuperLemon Dec 29 '11 at 21:10
3

Your question is really "do all items from x:x+k match items from y:y+k". That is, does a k-length subset occur twice in the line?

And you want x:x+k non-overlapping with y:y+k. The easy way to do this is to define y as x plus some offset, d. If you assure that k <= d < len(line)-x-k, then you're always looking within the boundaries of the line.

You'll then vary k from 1 to len(line)//2, looking for various length duplicates at a given offset from each other.

The offset from x to y, d, will vary between 1 and len(line)-x-k.

The starting position for x, similarly will vary from 0 to len(line)//2.

So, the "all" part is something like this: all( line[i] == line[i+d] for i in range(x,x+k) ) for various legal values of d, x and k.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • +1 for transforming a loosely posed question into an actionable problem spec. – Raymond Hettinger Dec 29 '11 at 20:27
  • I can't really tell from just looking at the line you wrote, but the sequence has to repeat right after 'itself', not just occur twice wherever in the line. I'll give it a try though, thanks. – SuperLemon Dec 29 '11 at 20:28
  • If the offset, d, between x and y is == k, then you're looking for a k-length sequences right next to each other. – S.Lott Dec 29 '11 at 20:33