I've been toying around with the GrabCut algorithm (as implemented in OpenCV) on the iPhone. The performance is horrid. It takes about 10-15 seconds to run even on the simulator for an image that's about 800x800. On my phone it runs for several minutes, eventually runs out of memory, and crashes (iPhone 4). I'm sure there's probably some optimization I can do if I write my own version of the algorithm in C, but I get the feeling that no amount of optimization is going to get it anywhere near usable. I've dug up some performance measurements in some academic papers and even they were seeing 30 second runtimes on multicore 1.8 ghz CPU's.
So my only hope is the GPU, which I know literally nothing about. I've done some basic research on OpenGL ES so far, but it is a pretty in-depth topic and I don't want to waste hours or days learning the basic concepts just so I can find out whether or not I'm on the right path.
So my question is twofold:
1) Can something like GrabCut be run on the GPU? If so, I'd love to have a starting point other than "learn OpenGL ES". Ideally I would like to know what concepts I need to pay particular attention to. Keep in mind that I have no experience with OpenGL and very little experience with image processing.
2) Even if this type of algorithm can be run on the GPU, what kind of performance improvement should I expect? Considering that the current runtime is about 30 seconds AT BEST on the CPU, it seems unlikely that the GPU will put a big enough dent in the runtime to make the algorithm useful.
EDIT: For the algorithm to be "useful", I think it would have to run in 10 seconds or less.
Thanks in advance.