1

We are writing an image processing algorithm targeting some Intel hardware. Generally we prefer generic C implementations, but we have identified an algorithm that at its core does a ton of Discrete Cosine Transforms (DCT's) that works extremely well. Unfortunately, our throughput requirements are such that a generic C implementation is about 2 orders of magnitude too slow. I can get one order of magnitude through some other tricks, so if I can improve my DCT's by about an order of magnitude I have a path towards success.

Is the Intel MMX a way to get at hardware acceleration to do these DCT's? Is there other intel specific libraries and/or hardware that I can exploit to speed these bad boys up?

Where do I start to look? This is a new job for me, and my first time digging hard into Intel hardware, so any pointers would be most appreciated.

John
  • 5,735
  • 3
  • 46
  • 62
  • which DCT operation do you mean? [discrete cosine transform](http://en.wikipedia.org/wiki/Discrete_cosine_transform) or [dominated convergence theorem](http://en.wikipedia.org/wiki/Dominated_convergence_theorem)? – wallyk Jan 19 '12 at 17:41
  • If your code's license is compatible with the GPL, you can look for hand-optimized DCT / inverse-DCT routines in video codecs like x264 or x265. – Peter Cordes Aug 25 '16 at 08:04

1 Answers1

3

Take a look at Intel's Integrated Performance Primitives library. It contains a wealth of routines that are optimized heavily to take use of the Intel architecture, specifically MMX and SSE. Among many other things, IPP also contains routines for the DCT (documentation here).

Martin B
  • 23,670
  • 6
  • 53
  • 72