The core of your function is
np.matrix([[1, 1], [1, 0]]) ** abs(n)
which is discussed in the Wiki article

np.matrix
implements **
as __pwr__
, which in turn uses np.linalg.matrix_power
. Essentially that's a repeated dot
matrix multiplication, with a modest enhancement by grouping the products by powers of 2.
In [319]: M=np.matrix([[1, 1], [1, 0]])
In [320]: M**10
Out[320]:
matrix([[89, 55],
[55, 34]])
The use of np.matrix
is discouraged, so I can do the same with
In [321]: A = np.array(M)
In [322]: A
Out[322]:
array([[1, 1],
[1, 0]])
In [323]: np.linalg.matrix_power(A,10)
Out[323]:
array([[89, 55],
[55, 34]])
Using the (newish) @
matrix multiplication operator, that's the same as:
In [324]: A@A@A@A@A@A@A@A@A@A
Out[324]:
array([[89, 55],
[55, 34]])
matrix_power
does something more like:
In [325]: A2=A@A; A4=A2@A2; A8=A4@A4; A8@A2
Out[325]:
array([[89, 55],
[55, 34]])
And some comparative times:
In [326]: timeit np.linalg.matrix_power(A,10)
16.2 µs ± 58.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [327]: timeit M**10
33.5 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [328]: timeit A@A@A@A@A@A@A@A@A@A
25.6 µs ± 914 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [329]: timeit A2=A@A; A4=A2@A2; A8=A4@A4; A8@A2
10.2 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
numpy
integers are implemented at int64
, in c
, so are limited in size. Thus we get overflow with a modest 100:
In [330]: np.linalg.matrix_power(A,100)
Out[330]:
array([[ 1298777728820984005, 3736710778780434371],
[ 3736710778780434371, -2437933049959450366]])
We can get around this by changing the dtype to object
. The values are then Python ints, and can grow indefinately:
In [331]: Ao = A.astype(object)
In [332]: Ao
Out[332]:
array([[1, 1],
[1, 0]], dtype=object)
Fortunately matrix_power
can cleanly handle object
dtype:
In [333]: np.linalg.matrix_power(Ao,100)
Out[333]:
array([[573147844013817084101, 354224848179261915075],
[354224848179261915075, 218922995834555169026]], dtype=object)
Usually math on object dtype is slower, but not in this case:
In [334]: timeit np.linalg.matrix_power(Ao,10)
14.9 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I'm guessing it's because of the small (2,2) size of the array, where fast compiled methods aren't useful. This is basically an iterative task, where numpy
doesn't have any advantages.
Scaling isn't bad - increase n
by 10, and only get a 3-4x increase in time.
In [337]: np.linalg.matrix_power(Ao,1000)
Out[337]:
array([[70330367711422815821835254877183549770181269836358732742604905087154537118196933579742249494562611733487750449241765991088186363265450223647106012053374121273867339111198139373125598767690091902245245323403501,
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875],
[43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875,
26863810024485359386146727202142923967616609318986952340123175997617981700247881689338369654483356564191827856161443356312976673642210350324634850410377680367334151172899169723197082763985615764450078474174626]],
dtype=object)
In [338]: timeit np.linalg.matrix_power(Ao,1000)
53.8 µs ± 83 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
With object dtype np.matrix
:
In [340]: Mo = M.astype(object)
In [344]: timeit Mo**1000
86.1 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
And for million, the times aren't as bad as I anticipated:
In [352]: timeit np.linalg.matrix_power(Ao,1_000_000)
423 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For comparison, the fastFibo
times on my machine are:
In [354]: fastFibo(100)
Out[354]: 354224848179261915075
In [355]: timeit fastFibo(100)
3.91 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [356]: timeit fastFibo(1000)
9.37 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [357]: timeit fastFibo(1_000_000)
226 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)