19

What is the best way to find the period in a repeating list?

For example:

a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}

has repeat {4, 5, 1, 2, 3} with the remainder {4, 5, 1, 2} matching, but being incomplete.

The algorithm should be fast enough to handle longer cases, like so:

b = RandomInteger[10000, {100}];
a = Join[b, b, b, b, Take[b, 27]]

The algorithm should return $Failed if there is no repeating pattern like above.

Arnoud Buzing
  • 15,383
  • 3
  • 20
  • 50
  • My first approach was to use `FindSequenceFunction[a]`, but then I couldn't figure out a good way to programmatically determine the period from that. – Brett Champion Nov 18 '11 at 02:29
  • Thanks for the responses so far. It looks like @Szabolcs solution is the best one. I'm still working on my own independent solution but it's still twice as slow. – Arnoud Buzing Nov 18 '11 at 18:28
  • There is an article in wikipedia about [Cycle Detection](http://en.wikipedia.org/wiki/Cycle_detection), I believe it is relavent. – Mohammad Alaggan Nov 18 '11 at 23:09
  • Related: http://mathematica.stackexchange.com/q/119907/280 – Alexey Popkov Sep 11 '16 at 15:05

9 Answers9

8

Please see the comments interspersed with the code on how it works.

(* True if a has period p *)
testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]

(* are all the list elements the same? *)
homogeneousQ[list_List] := Length@Tally[list] === 1
homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)

(* auxiliary for findPeriodOfFirstElement[] *)
reduce[a_] := Differences@Flatten@Position[a, First[a], {1}]

(* the first element occurs every ?th position ? *)
findPeriodOfFirstElement[a_] := Module[{nl},
  nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
  Fold[Total@Take[#2, #1] &, 1, Reverse[nl]]
  ]

(* the period must be a multiple of the period of the first element *)
period[a_] := Catch@With[{fp = findPeriodOfFirstElement[a]},
   Do[
    If[testPeriod[p, a], Return[p]],
    {p, fp, Quotient[Length[a], 2], fp}
    ]
   ]

Please ask if findPeriodOfFirstElement[] is not clear. I did this independently (for fun!), but now I see that the principle is the same as in Verbeia's solution, except the problem pointed out by Brett is fixed.

I was testing with

b = RandomInteger[100, {1000}];
a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];

(Note the low integer values: there will be lots of repeating elements within the same period *)


EDIT: According to Leonid's comment below, another 2-3x speedup (~2.4x on my machine) is possible by using a custom position function, compiled specifically for lists of integers:

(* Leonid's reduce[] *)

myPosition = Compile[
  {{lst, _Integer, 1}, {val, _Integer}}, 
  Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0}, 
    For[i = 1, i <= Length[lst], i++, 
      If[lst[[i]] == val, pos[[++ctr]] = i]
    ]; 
    Take[pos, ctr]
  ], 
  CompilationTarget -> "C", RuntimeOptions -> "Speed"
]

reduce[a_] := Differences@myPosition[a, First[a]]

Compiling testPeriod gives a further ~20% speedup in a quick test, but I believe this will depend on the input data:

Clear[testPeriod]
testPeriod = 
 Compile[{{p, _Integer}, {a, _Integer, 1}}, 
  Drop[a, p] === Drop[a, -p]]
Community
  • 1
  • 1
Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • 1
    After some reflection, I believe it will be better to define `homogeneousQ[{x_ ..}] = True` etc., in practical use. – Mr.Wizard Nov 18 '11 at 12:53
  • @Mr.Wizard That's one thing, and another is that which algorithm is faster depends on the patterns in data (number of unique elements, their distribution, etc.). For my test data from above my version is ~2x faster than yours with this `testPeriod` function, but for other data it might not be. For this data `Length@Tally@list` is faster than `Union@Tally@list`, but this is not true for other types of data. And yes, I actually wanted to ask what alternatives we have for `homogeneousQ` :) – Szabolcs Nov 18 '11 at 12:54
  • Thanks. Do you agree or disagree that `{x_ ..}` is the way to go here, for most data? – Mr.Wizard Nov 18 '11 at 12:57
  • @Mr.Wizard Mathematica is a "there's more than one way to do it" language! `Length@Union[...] == 1` was like a reflex for me, I never thought much about it after I started using it. One advantage of `MatchQ` is the freedom you have about handling the empty list: `{x_ ...}`. One more: `RotateLeft[list,1] === list` – Szabolcs Nov 18 '11 at 13:03
  • Pretty impressive. I doubt that even best compiled to C solutions would seriously outperform this. +1. – Leonid Shifrin Nov 18 '11 at 13:52
  • 1
    @Mr.W I was overcomplicating testPeriod all along. See my edit. – Szabolcs Nov 18 '11 at 14:46
  • @Leonid Actually I think that if this approach gives any speedup at all (Mr.W's version, appropriately modified, will be faster when compiled, I believe), it will be only because the Mathematica way of testing for periodicity neccesitates copying long vectors, so minimizing the number of times we need to test for periodicity will speed things up. This won't affect the Python version. – Szabolcs Nov 18 '11 at 15:16
  • 1
    @Szabolcs It turns out that one of the major bottlenecks in your code is a call to `Flatten[Position[...]]`. If you replace that with a call to a custom position function like this: `myPosition = Compile[{{lst, _Integer, 1}, {val, _Integer}}, Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0}, For[i = 1, i <= Length[lst], i++, If[lst[[i]] == val, pos[[++ctr]] = i]]; Take[pos, ctr]], CompilationTarget -> "C", RuntimeOptions -> "Speed"]`, you get another 2-3x speedup: `reduce[a_] := Differences@myPosition[a, First[a]]`. – Leonid Shifrin Nov 18 '11 at 20:51
  • @Szabolcs Please don't refer to the comments for how it works. Comments are neither versioned nor tracked and are entirely at the mercy of the commenter. If they so choose to delete their comment, then it's useless. Please include the information from the comments into your answer so as to document it. – abcd Nov 18 '11 at 23:38
  • @Leonid Thanks for the analysis, you're right! I reply so late, as I wanted to try to compile a few other possibilities first :) Do you use Workbench for profiling, or do you do it "by intuition"? – Szabolcs Nov 21 '11 at 15:20
  • @Leonid Somehow, after looking at the Python based answer here, I had the hunch that the Mathematica bottleneck is in `testPeriod` because `Drop` forces a copy of the array. (After all whay my method does, compares to Mr Wizard's or the Python solution, is that it reduces the number of times `testPeriod` needs to run.) It seems that was wrong. – Szabolcs Nov 21 '11 at 15:30
  • 1
    @Szabolcs "By intuition". To tell you the truth, I knew that `Position` is not optimal for machine types for a long time, and this custom function above I used already many times in different settings. Another method which is slightly slower, but still very fast and does not use compile to C is due to Norbert Pozar and uses SparseArrays: `extractPositionFromSparseArray[HoldPattern[SparseArray[u___]]]:= {u}[[4, 2, 2]];positionExtr[x_List, n_] := Flatten @ extractPositionFromSparseArray[SparseArray[Unitize[x - n], Automatic, 1]]`. I actually never use the WB profiler, which is perhaps a mistake. – Leonid Shifrin Nov 21 '11 at 17:16
  • @Leonid When I realized that `Compile` helps even something as simple as `Drop[a,p] === Drop[a,-p]` (I guess by using an Integer-specific version), I tried to compile `Differences@Flatten@Position[a, First[a]]`, and it gives a great speed boost (full function ran in 2.2 vs 1.4 s with your `myPeriod`). `First@Transpose@Position[...]` and `Position[...][[All, 1]]` don't make a difference compared to `Flatten`. Compiling to C doesn't help in these situations (compared to using "plain" Compile). Just thought this is something interesting to mention. – Szabolcs Nov 21 '11 at 17:48
  • @Szabolcs This is indeed interesting, and in retrospect not unexpected (Position can not easily optimize for packed arrays when not compiled, since the second argument is generally a pattern). I vaguely remember making similar observation some time in the past, but for some reason I completely forgot about this possibility here. It may actually be possible to write your solution as a single larger Compile-able function, and it may (or may not) lead to some additional speed improvements. – Leonid Shifrin Nov 21 '11 at 17:57
7

Above methods are better if you have no noise. If your signal is only approximate then Fourier transform methods might be useful. I'll illustrate with a "parametrized" setup wherein the length and number of repetitions of the base signal, the length of the trailing part, and a bound on the noise perturbation are all variables one can play with.

noise = 20;
extra = 40;
baselen = 103;
base = RandomInteger[10000, {baselen}];
repeat = 5;
signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];

We compute the absolute value of the FFT. We adjoin zeros to both ends. The object will be to threshold by comparing to neighbors.

sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];

Now we create two 0-1 vectors. In one we threshold by making a 1 for each element in the fft that is greater than twice the geometric mean of its two neighbors. In the other we use the average (arithmetic mean) but we lower the size bound to 3/4. This was based on some experimentation. We count the number of 1s in each case. Ideally we'd get 100 for each, as that would be the number of nonzeros in a "perfect" case of no noise and no tail part.

In[419]:= 
thresh1 = 
  Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1, 
    0], {j, 2, Length[sigfft] - 1}];
count1 = Count[thresh1, 1]
thresh2 = 
  Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1, 
    0], {j, 2, Length[sigfft] - 1}];
count2 = Count[thresh2, 1]

Out[420]= 114

Out[422]= 100

Now we get our best guess as to the value of "repeats", by taking the floor of the total length over the average of our counts.

approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
Out[423]= 5

So we have found that the basic signal is repeated 5 times. That can give a start toward refining to estimate the correct length (baselen, above). To that end we might try removing elements at the end and seeing when we get ffts closer to actually having runs of four 0s between nonzero values.

Something else that might work for estimating number of repeats is finding the modal number of zeros in run length encoding of the thresholded ffts. While I have not actually tried that, it looks like it might be robust to bad choices in the details of how one does the thresholding (mine were just experiments that seem to work).

Daniel Lichtblau

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
Daniel Lichtblau
  • 6,854
  • 1
  • 23
  • 30
  • 1
    that is also what I instictively thought of doing, but couldn't get to work reliably. so +1 – acl Nov 21 '11 at 01:57
5

The following assumes that the cycle starts on the first element and gives the period length and the cycle.

findCyclingList[a_?VectorQ] :=
  Module[{repeats1, repeats2, cl, cLs, vec}, 
  repeats1 = Flatten@Differences[Position[a, First[a]]];
  repeats2 = Flatten[Position[repeats1, First[repeats1]]]; 
  If[Equal @@ Differences[repeats2] && Length[repeats2] > 2(* 
   is potentially cyclic - first element appears cyclically *),
   cl = Plus @@@ Partition[repeats1, First[Differences[repeats2]]];
   cLs = Partition[a, First[cl]];
   If[SameQ @@ cLs  (* candidate cycles all actually the same *), 
    vec = First[cLs];
    {Length[vec], vec}, $Failed], $Failed]  ]

Testing

b = RandomInteger[50, {100}];
a = Join[b, b, b, b, Take[b, 27]];

findCyclingList[a]

{100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38, 
  23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33, 
  40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
   31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
   43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31, 
  42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}

b1 = RandomInteger[10000, {100}]; 
a1 = Join[b1, b1, b1, b1, Take[b1, 23]];

findCyclingList[a1]

{100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203, 
  3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515, 
  5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830, 
  2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389, 
  6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695, 
  4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429, 
  7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744, 
  6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284, 
  3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774, 
  1289, 6352}}

This case should fail because it isn't cyclical.

findCyclingList[Join[b, Take[b, 11], b]]

$Failed

I tried to something with Repeated, e.g. a /. Repeated[t__, {2, 100}] -> {t} but it just doesn't work for me.

Verbeia
  • 4,400
  • 2
  • 23
  • 44
  • The check in the first `If` statement causes you to return `$Failed` when there are multiple occurrences of the first element in one period, as with: `{1, 2, 3, 1, 2, 1, 2, 3, 1, 2, 1, 2, 3, 1, 2, 1, 2, 3, 1, 2}`. I thing your check needs to be that the list of differences is itself periodic. – Brett Champion Nov 18 '11 at 02:09
  • oh, good point - I am not sure how to work around that right now – Verbeia Nov 18 '11 at 02:22
  • this revised version should be better – Verbeia Nov 18 '11 at 03:00
  • @Verbeia `a /. {Repeated[k__, {2, Infinity}], Shortest[f___]} -> {{k}}` works but is way too slow – Dr. belisarius Nov 18 '11 at 03:04
3

Does this work for you?

period[a_] := 
   Quiet[Check[
      First[Cases[
         Table[
            {k, Equal @@ Partition[a, k]}, 
            {k, Floor[Length[a]/2]}], 
         {k_, True} :> k
         ]], 
      $Failed]]

Strictly speaking, this will fail for things like

a = {1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5}

although this can be fixed by using something like:

(Equal @@ Partition[a, k]) && (Equal @@ Partition[Reverse[a], k])

(probably computing Reverse[a] just once ahead of time.)

Brett Champion
  • 8,497
  • 1
  • 27
  • 44
2

I propose this. It borrows from both Verbeia and Brett's answers.

Do[
  If[MatchQ @@ Equal @@ Partition[#, i, i, 1, _], Return @@ i],
  {i, #[[ 2 ;; Floor[Length@#/2] ]] ~Position~ First@#}
] /. Null -> $Failed &

It is not quite as efficient as Vebeia's function on long periods, but it is faster on short ones, and it is simpler as well.

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
  • +1 for creative period testing function, simplicity, and (potential) speed. Your bottleneck if actually period testing. With a faster one (like the ugly one I use), this is also the fastest solution. I'm clearly overcomplicating things :-) – Szabolcs Nov 18 '11 at 12:26
  • @Szabolcs, something that efficient cannot be ugly. (Never mind the rest, I see it now.) – Mr.Wizard Nov 18 '11 at 12:31
1

I don't know how to solve it in mathematica, but the following algorithm (written in python) should work. It's O(n) so speed should be no concern.

def period(array):
    if len(array) == 0:
        return False
    else:
        s = array[0]
        match = False
        end = 0
        i = 0

        for k in range(1,len(array)):
            c = array[k]

            if not match:
                if c == s:
                    i = 1
                    match = True
                    end = k
            else:
                if not c == array[i]:
                    match = False
                i += 1

        if match:
            return array[:end]
        else:
            return False

# False         
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,1]))
# [4, 5, 1, 2, 3]            
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2]))
# False
print(period([4]))
# [4, 2]
print(period([4,2,4]))
# False
print(period([4,2,1]))
# False
print(period([]))
Konstantin Weitz
  • 6,180
  • 8
  • 26
  • 43
1

Ok, just to show my own work here:

ModifiedTortoiseHare[a_List] := Module[{counter, tortoise, hare},
Quiet[
 Check[
  counter = 1;
  tortoise = a[[counter]];
  hare = a[[2 counter]];
  While[(tortoise != hare) || (a[[counter ;; 2 counter - 1]] != a[[2 counter ;; 3 counter - 1]]),
   counter++;
   tortoise = a[[counter]];
   hare = a[[2 counter]];
  ];
 counter,
$Failed]]]

I'm not sure this is a 100% correct, especially with cases like {pattern,pattern,different,pattern, pattern} and it gets slower and slower when there are a lot of repeating elements, like so:

{ 1,2,1,1, 1,2,1,1, 1,2,1,1, ...} 

because it is making too many expensive comparisons.

Arnoud Buzing
  • 15,383
  • 3
  • 20
  • 50
0
#include <iostream>
#include <vector>
using namespace std;


int period(vector<int> v)
{
    int p=0; // period 0

    for(int i=p+1; i<v.size(); i++)
    {

        if(v[i] == v[0])
        {
            p=i; // new potential period


            bool periodical=true;
            for(int i=0; i<v.size()-p; i++)
            {
                if(v[i]!=v[i+p])
                {
                    periodical=false;
                    break;
                }
            }
            if(periodical) return p;


            i=p; // try to find new period
        }
    }

    return 0; // no period

}


int main()
{

    vector<int> v3{1,2,3,1,2,3,1,2,3};
    cout<<"Period is :\t"<<period(v3)<<endl;


    vector<int> v0{1,2,3,1,2,3,1,9,6};
    cout<<"Period is :\t"<<period(v0)<<endl;

    vector<int> v1{1,2,1,1,7,1,2,1,1,7,1,2,1,1};
    cout<<"Period is :\t"<<period(v1)<<endl;
    return 0;
}
-1

This sounds like it might relate to sequence alignment. These algorithms are well studied, and might already be implemented in mathematica.

Gian
  • 13,735
  • 44
  • 51