You are effectively doing an n
-fold convolution of [1, 1, 1, ..., 1, 1, 1].
As such, have you considered using an FFT on a sufficiently zero padded array, raising its elements to the power n
and using the inverse FFT to recover all of the coefficients of
(1+x+x**2+x**3+...+x**(k-1))**n
and then just reading off the ones you are interested in?
UPDATE:
Since the FFT is cyclic, you'll need an array that is no smaller than the number of terms in
(1+x+x**2+x**3+...+x**(k-1))**n
or, in other words, (k-1)*n+1
so that the results don't wrap around at the ends (or at least so when they do they only add zeros to affected elements). Often its length should also be a power of two since this is required by the FFT algorithm (implementations that don't require it to will pad your input with zeros until it does).
In C-like pseudocode:
unsigned int m = 1;
while(m<(k-1)*n+1) m *= 2;
complex c[m];
for(unsigned int i=0;i!=k;++i) c[i] = complex(1.0, 0.0);
for(unsigned int i=k;i!=m;++i) c[i] = complex(0.0, 0.0);
c = fft(c);
for(unsigned int i=0;i!=m;++i) c[i] = pow(c[i], double(n));
c = inv_fft(c);
At the end of this, the r
'th element of the complex array c
has a real part equal to the coefficient of x**r
and an imaginary part of zero.
Now since this is all done in floating point you should be aware that those elements will have accumulated rounding errors. You can partially correct this by rounding them to the nearest integer but be aware that for sufficiently large k
and n
those errors will exceed 0.5 so this can yield results that are off by some small relative error.
A quick search on the web reveals than numpy has implementations of the FFT and its inverse, numpy.fft.rfft
and numpy.fft.irfft
respectively, that you can use when the input data is real.