5

I wrote a piece of code that needs to be optimized. Just felt like checking with community to see if that code is indeed optimal. It fills up the accumulator for the Hough transform. I actually just copy pasted most the code from the OpenCV library. Thanks!


int i,j,n,index;
for (i = 0;i<numrows;i++)
{
    for (j = 0;j<numcols;j++)
    {
            if (img[i*numcols + j] == 100)
        {
            for (n = 300;n<600;n++)
            {   
                index = cvRound(j*tabCos[n] + i * tabSin[n]) + (numrho-1)/2;
                accum[(n+1) * (numrho+2) + index+1]++;
            }
        }
    }
}
Denis
  • 664
  • 9
  • 24
  • do you have actual example of data upon which apply this code ? Looks like there is several possible optimization, some are independant of data, but others would depend on the actual .distribution of data in img and the size of imag. – kriss Nov 19 '10 at 21:42
  • an example of the data i have is at http://stackoverflow.com/questions/4372259/hough-transform-error-in-matlab-and-opencv i do realize that there are only 3 points per column (thats how i created those images) so there should be some way of speeding it up but the time consumin part is filling up the accumulator and not going through the image – Denis Dec 14 '10 at 21:19

3 Answers3

2

There is a large and repetitive Hough transform in a piece of code I'm vaguely attached too. The maintainer of that part of the code has been experimenting with sparse arrays (actually a C++ std::map keyed on the cell index if I understood his presentation right) for the accumulator with some success.

I presume the speed up is related to cache locality issues, and it certainly depends on the data being sparse.


Update: The software referenced above is intended to serve many particle physics experiments, but was originally used on a test-bed project (i.e. small scale). As we've gotten serious about doing larger projects and started doing Monte Carlo for them, the Hough transform has gotten to be a bit of a bottle neck again even with the sparse matrix.

As yet we do not have a solution, but one of colleagues found Gandalf which includes "fast hough transform", which appears to evaluate the transform in a quad-tree like way (in 2D, presumably you use a oct-tree in 3D) to reduce the order of work. We're probably going to experiment with this.

Further update: A colleague eventual implemented a progressive, probabilistic Hough transform in our code which currently seems to be the fastest version we've got. Works best if you don't require that every point gets assigned to a line.

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
  • 1
    do you have a link to the paper speaking about this approach? my matrix is quite sparse so that'd fit perfectly. – Denis Nov 19 '10 at 19:48
  • @Denis: No. This work is still going on, and as it is analysis code for a particle physics experiment there is unlikely to be a paper on the code itself, though I suspect it will go into the student's dissertation. – dmckee --- ex-moderator kitten Nov 19 '10 at 21:14
  • @Denis I don't know about the approach reported by dmckee but the paper cited in Gandalf is: LI, Hungwen; LAVIN, Mark A.; LE MASTER, Ronald J. **Fast Hough transform: A hierarchical approach.** _Computer Vision, Graphics, and Image Processing_, 1986, 36.2: 139-161. – Alessandro Jacopson Apr 15 '13 at 20:59
1

No it's not. Replace as many of the [] usages as you can by simple pointer arithmetic to iterate the arrays in question. Abstract out invariant expressions into local variables.

However, the first question is, does your profiler show that this code is a bottleneck in the context of your entire app. If not, why bother micro-optimizing this?

EDIT: loop micro-optimization - prefer the second as no array indexing required (mult versus add)

int ints[100];
int i;
int *pi;

for (i = 0; i < 100; ++i)
{
  printf("%d", ints[i]);
}

for (pi = ints; pi < ints + 100; ++pi)
{
  printf("%d", *pi);
}
Steve Townsend
  • 53,498
  • 9
  • 91
  • 140
  • i thought [] and simple pointer arithmetic would be equivalent from the compiler point of view, (*(accum+(n+1) * (numrho+2) + index+1))++; would then be equivalent to accum[(n+1) * (numrho+2) + index+1]++; no? this part definitely is the bottleneck in my processing, the rest of the program is very simple. – Denis Nov 19 '10 at 19:46
  • @denis - if that's all you do then yes, but see edit for an example – Steve Townsend Nov 19 '10 at 19:48
  • I'm sorry, i'm kind of new to this forum, which edit are you referring to? – Denis Nov 19 '10 at 19:52
0

Depending on your application, there are numerous way to optimise Hough Transform and fiddling with low-level code is possibly the last of them. I would start with Randomised HT or Multiresolution HT, followed Hybrid approach merge. I believe it is better to optimised algorithm first. Last step would be do use hardware optimisation like CAD memory.