So I am playing a bit with DCT implementations and noticed they are (relative) slow due to the necessary multiplier calculations.
After googling a bit, I came across BinDCT, which results in very good approximations of the DCT and only uses bit shifts.
While scanning a paper about it (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.7.834&rep=rep1&type=pdf and http://www.docstoc.com/docs/130118150/Image-Compression-Using-BinDCT) and reading some code I found on ohlo (http://code.ohloh.net/file?fid=vz-HijUWVLFS65NRaGZpLZwZFq8&cid=mt_ZjvIU0Us&s=&fp=461906&projSelected=true#L0), I noticed there are only implementations for a 8x8 matrix.
I am looking for an implementation of this BinDCT for a 32x32 matrix so I can use it in a faster variation of the perceptual hash algorithm (phash).
I am no mathematician and although I tried to understand what's going on in the paper and the c code I found I just can't wrap my head around how to transform this implementation to apply to a 32x32 matrix.
Has anyone ever written one? Is it even possible?
I understand that extending the implementation requires a lot more bit shifting and tmp variables. But although I could try with trial and error, I don't even understand the theory, so I would never know if I get the correct result.
I am writing this in C#, but any language would suffice as it's all basic operations and can be easily translated.