2

I want to write a function for converting BGRA in BGR. void convertBGRAViewtoBGRView( const boost::gil::bgra8_view_t &src, boost::gil::bgr8_view_t dst ) If I write it like this:

size_t numPixels = src.width() * src.height();
boost::gil::bgra8_view_t::iterator it = src.begin();
boost::gil::bgr8_view_t::iterator itD = dst.begin();
for( int i = 0; i < numPixels; ++i ){

    boost::gil::bgra8_pixel_t pixe(it[0]);
    *it++;
    boost::gil::bgr8_pixel_t pix(pixe[0],pixe[1],pixe[2]);

    *itD++ = pix;        
}

it works, but it is very slow. So I want to use NEON instructions and therefore I need a pointer for example (UInt8*) or (UInt32*). I tried it like this:

UInt32 *temp = (UInt32*)&src(0,0);
for( int i = 0; i < numPixels; ++i ){        
    boost::gil::bgr8_pixel_t pixk( (( *temp) & 0xff), ( (*temp>>8) & 0xff), ((*temp >> 16 )& 0xff));
    *itD++ = pixk;
    temp += 1;
}

This works more or less, but the resulting image isn't correct. I think maybe a problem with alignment. Does anyone have an idea how get it to work? This solution is about 3 times faster than the solution with the iterator.

UPDATE: I checked with the debugger: the src has width 480x360 and till i == 259 everything is correct, but afterwords the solution with iterator and pointer is different.

Thanks.

manlio
  • 18,345
  • 14
  • 76
  • 126
steffenmauch
  • 353
  • 5
  • 16

2 Answers2

2

After some computation based on your answer, I found out that 360*4 is dividable by anything up to 32, whereas 360*4+8*4 is even dividable by 64. So I guess the reason for the error is that GIL in your case tries to align image rows at 64 byte boundaries and therefore doesn't store them contiguously.

Because of this it is always advised to use the generic iterator interface instead of messing with the raw memory directly, otherwise you have to be completely sure about any such alignment conventions (but maybe they are perfectly standardized and can be read somewhere in the documentation).

Christian Rau
  • 45,360
  • 10
  • 108
  • 185
  • Wow that is a really good point. Well I tried it before with iterators but that was much slower than this solution. The idea is to use NEON instructions to speed it up. Do you know why it is much slower with iterators? The difference was about 30x! – steffenmauch Feb 06 '12 at 14:58
  • 1
    @user1150937 I hear you and performance is a good reason to break out of the iterator interface. But then you really should know what you're doing and what GIL thinks about it. – Christian Rau Feb 06 '12 at 15:01
  • Do you know if it is because of this aligned with 64byte? `bgra8_view_t sourceView = interleaved_view(srcWidth, srcHeight,(bgra8_pixel_t*)dataPtr, stride); copy_with_regards_to_orientation(sourceView, view(result), orientation);` – steffenmauch Feb 06 '12 at 15:06
  • @user1150937 Don't know, as I don't know anything about the GIL and its workings. And well, of course these iterators are a bit slower than raw memory handling (because they have to treat exactly those cases you needed to treat, too). But those 30x are really heavy! Are you sure you tested it in release build (instead of debug build), because I guess, like STL iterators, those GIL iterators rely heavily on inlining and simple compiler optimizations, which just makes them suck in debug mode. – Christian Rau Feb 06 '12 at 15:12
  • you're right in release mode it is much faster, but (thanks god) my solution is a bit faster. Thank you very much for your help! – steffenmauch Feb 06 '12 at 20:49
0

OK I found how to fix it, but still don't know the reason :) This works for images with width 360 in my case.

UInt32 *temp = (UInt32*)&src(0,0);
for( int i = 0; i < numPixels; ++i ){   
  if( i%360==0 && i!=0 ){
    temp += 8;
  }
  boost::gil::bgr8_pixel_t pixk( (( *temp) & 0xff), ( (*temp>>8) & 0xff), ((*temp >> 16 )& 0xff));
  *itD++ = pixk;
  temp += 1;
}

It is even better to use this one for the iOS platform:

UInt8 *temp = (UInt8*)&src(0,0);
for( int i = 0; i < numPixels; ++i ){   
  if( i%360==0 && i!=0 ){
    temp += 8*4;
  }
  boost::gil::bgr8_pixel_t pixk( *temp, *(temp+1), *(temp+2));
  *itD++ = pixk;
  temp += 4;
}

Getting rid of the other iterator further improves speed (tested on iOS).

steffenmauch
  • 353
  • 5
  • 16