4

I'm working with a list of lists that have the periods of continued fractions for non-perfect square roots in each of them.

What I'm trying to do with them is to check the size of the largest repeating pattern in each list.

Some of the lists for example:

[
 [1,1,1,1,1,1....],
 [4,1,4,1,4,1....],
 [1,2,10,1,2,10....],
 [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8....],
 [2,2,2,4,2,2,2,4....],
 [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15....],
 [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25....]
]

The two similar methods that I've been working with are:

def lengths(seq):
    for i in range(len(seq),1,-1):
        if seq[0:i] == seq[i:i*2]:
            return i


def lengths(seq):
    for i in range(1,len(seq)-1):
        if seq[0:i] == seq[i:i*2]:
            return i    

These both take the size of the lists and compare indexed sizes of it from the current position. The problem is first one returns wrong for just one repeating digit because it starts big and see's just the one large pattern. The problem with the second is that there are nested patterns like the sixth and seventh example list and it will be satisfied with the nested loop and overlook the rest of the pattern.

tijko
  • 7,599
  • 11
  • 44
  • 64

5 Answers5

4

Works (caught a typo in 4th element of your sample)

>>> seq_l = [
...  [1,1,1,1,1,1],
...  [4,1,4,1,4,1],
...  [1,2,10,1,2,10],
...  [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8],
...  [2,2,2,4,2,2,2,4,2,2,2,4,2,2,2,4],
...  [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15],
...  [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25]
... ]
>>> 
>>> def rep_len(seq):
...     s_len = len(seq)
...     for i in range(1,s_len-1):
...         if s_len%i == 0:
...             j = s_len/i
...             if seq == j*seq[:i]:
...                 return i
...                 
... 
>>> [rep_len(seq) for seq in seq_l]
[1, 2, 3, 12, 4, 18, 12]
Phil Cooper
  • 5,747
  • 1
  • 25
  • 41
  • 1
    tijko stated that the lists always begin with the beginning of a repeating sequence. Do the lists always _end_ with the end of a repeating sequence? I don't think this will work if not. – senderle Jul 09 '12 at 23:11
  • by definition it would have to. `[1,2,1,2,1]` is a sequence that can only be 5 elements long repeating once. plus this algo takes advantage of that fact by only checking for sequences for which the total is evenly divisible. If the assumption does not hold then it's simple to compare to an appropriately truncated version of the original e.g. `seq[:i*j] == seq[:i]*j` (eliminating the `if ...%i` conditional) – Phil Cooper Jul 09 '12 at 23:23
  • @senderle no the lists do not necessarily end with the end of the repeating sequence. – tijko Jul 10 '12 at 02:06
  • @Phil Cooper, really appreciate the help! I did take out the "if conditional" and use the "truncated version of the original". – tijko Jul 10 '12 at 02:20
2

If it's not unfeasible to convert your lists to strings, using regular expressions would make this a trivial task.

import re

lists = [
    [1,1,1,1,1,1],
    [4,1,4,1,4,1],
    [1,2,10,1,2,10],
    [1,1,1,1,1,4,1,4,1,20,9,8,1,1,1,1,1,4,1,4,1,20,9,8], #I think you had a typo in this one...
    [2,2,2,4,2,2,2,4],
    [1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15,1,1,1,13,21,45,3,3,1,16,4,1,4,1,1,1,24,15],
    [1,1,1,3,28,1,1,1,3,28,67,25,1,1,1,3,28,1,1,1,3,28,67,25]
]

for l in lists:
    s = "x".join(str(i) for i in l)
    print s
    match = re.match(r"^(?P<foo>.*)x?(?P=foo)", s)
    if match:
        print match.group('foo')
    else:
        print "****"
    print

(?P<foo>.*) creates a group known as "foo" and (?P=foo) matches that. Since regular expressions are greedy, you get the longest match by default. The "x?" just allows for a single x in the middle to handle even/odd lengths.

Sean McSomething
  • 6,376
  • 2
  • 23
  • 28
1

You probably could do a collections.defaultdict(int) to keep counts of All the sublists, unless you know there are some sublists you don't care about. Convert the sublists to tuples before making them dictionary keys.

You might be able to get somewhere using a series of bloom filters though, if space is tight. You'd have one bloom filter for subsequences of length 1, another for subsequences of length 2, etc. Then the largest bloom filter that gets a collision has your maximum length sublist.

http://stromberg.dnsalias.org/~strombrg/drs-bloom-filter/

user1277476
  • 2,871
  • 12
  • 10
0

I think you just have to check two levels of sequences at once.0..i == i..i*2 and 0..i/2 != i/2..i.

def lengths(seq):
    for i in range(len(seq),1,-1):
        if seq[0:i] == seq[i:i*2] and seq[0:i/2] != seq[i/2:i]:
            return i

If the two halves of 0..i are equal then it means that you are actually comparing two concatenated patterns with each other.

Omar Awile
  • 140
  • 1
  • 6
  • 2
    What about the point at which `seq[0:i / 3] == seq[i / 3:2 * i / 3] == seq[2 * i / 3:]` but `seq[0:i / 2] != seq[i / 2:]`? Seems like this approach would require checking against every prime factor of `i`. – senderle Jul 09 '12 at 22:39
  • Good point :) I didn't think it through well enough! – Omar Awile Jul 09 '12 at 22:49
0

Starting with the first example method, you could recursively search the sub pattern.

def lengths(seq):
    for i in range(len(seq)-1,1,-1):
        if seq[0:i] == seq[i:i*2]:
            j = lengths(seq[0:i]) # Search pattern for sub pattern
            if j < i and i % j == 0: # Found a smaller pattern; further, a longer repeated
                # pattern length must be a multiple of the shorter pattern length
                n = i/j # Number of pattern repetitions (might change to // if using Py3K)
                for k in range(1, n): # Check that all the smaller patterns are the same
                    if seq[0:j] != seq[j*n:j*(n+1)]: # Stop when we find a mismatch
                        return i # Not a repetition of smaller pattern
                else: return j # All the sub-patterns are the same, return the smaller length
            else: return i # No smaller pattern

I get the feeling this solution isn't quite correct, but I'll do some testing and edit it as necessary. (Quick note: Shouldn't the initial for loop start at len(seq)-1? If not, you compare seq[0:len] to seq[len:len], which seems silly, and would cause the recursion to loop infinitely.)

Edit: Seems sorta similar to the top answer in the related question senderle posted, so you'd best just go read that. ;)

Community
  • 1
  • 1
Dubslow
  • 553
  • 2
  • 6
  • 15