1

Given 2 RGB colors stored as 32bit ints (8bits of alpha can be ignored or set to 0xff).

Whats the fastest way to blend them using a 3rd integer from 0-255.


Here is a naive implementation which simply interpolates the values as ints:

int32_t rgb_blend(uint32_t src, uint32_t dst, uint32_t blend) {
    const uint32_t iblend = 255 - blend;
    union {
        uint32_t u32;
        struct { uint8_t
#if defined(__LITTLE_ENDIAN__)
            a, b, g, r;
#else
            r, g, b, a;
#endif
        } u8;
    } out, *s = (const void *)&src, *d = (const void *)&dst;

    out.u8.r = (uint8_t)((((uint32_t)s->u8.r * iblend) + ((uint32_t)d->u8.r * blend)) / 255);
    out.u8.g = (uint8_t)((((uint32_t)s->u8.g * iblend) + ((uint32_t)d->u8.g * blend)) / 255);
    out.u8.b = (uint8_t)((((uint32_t)s->u8.b * iblend) + ((uint32_t)d->u8.b * blend)) / 255);
    out.u8.a = 0xff;
    return out.u32;
}

This doesn't have to be totally accurate, some rounding bias to get some extra performance is fine. (i / 256) rounds down for eg but can be replaced with (((i * 2) + 255) / (2 * 255) to round at 0.5 while only using integer operations.


Notes:

  1. this question is similar, but am asking about RGB colors, not RGBA alpha blending.
  2. While the question isn't architecture spesific, would be interested in common architectures - AMD-64, ARM-64 for eg.
Community
  • 1
  • 1
ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • What machine are you targeting? – user3528438 Dec 12 '16 at 14:34
  • The reason why your code is slow is because operator `/` is slow. So you can estimate `x/255` by `x>>8` or `x/256` as in the answer by @ideasman42, or use `x*257/65536` or `((x<<8)+x)>>16` which gives you a good estimate at `x/255.0039`. – user3528438 Dec 12 '16 at 14:52

1 Answers1

1

Colors can be blended using only uses add, subtract, multiply and bit-shift (no branches or type conversion):

uint32_t rgb_interp(uint32_t src, uint32_t dst, uint32_t t) {
    assert(t <= 255);
    const uint32_t s = 255 - t;
#if defined(__LITTLE_ENDIAN__)
    return (
        (((((src >> 0)  & 0xff) * s +
           ((dst >> 0)  & 0xff) * t) >> 8)) |
        (((((src >> 8)  & 0xff) * s +
           ((dst >> 8)  & 0xff) * t)     )  & ~0xff) |
        (((((src >> 16) & 0xff) * s +
           ((dst >> 16) & 0xff) * t) << 8)  & ~0xffff) |
        0xff000000
    );
#else
    return (
        (((((src >> 24) & 0xff) * s +
           ((dst >> 24) & 0xff) * t) << 16) & ~0xffffff) |
        (((((src >> 16) & 0xff) * s +
           ((dst >> 16) & 0xff) * t) << 8)  & ~0xffff) |
        (((((src >> 8)  & 0xff) * s +
           ((dst >> 8)  & 0xff) * t)     )  & ~0xff) |
        0xff
    );
#endif
}

RGBA version for reference:

uint32_t rgba_interp(uint32_t src, uint32_t dst, uint32_t t) {
    assert(t <= 255);
    const uint32_t s = 255 - t;
    return (
        (((((src >> 0)  & 0xff) * s +
           ((dst >> 0)  & 0xff) * t) >> 8)) |
        (((((src >> 8)  & 0xff) * s +
           ((dst >> 8)  & 0xff) * t)     )  & ~0xff) |
        (((((src >> 16) & 0xff) * s +
           ((dst >> 16) & 0xff) * t) << 8)  & ~0xffff) |
        (((((src >> 24) & 0xff) * s +
           ((dst >> 24) & 0xff) * t) << 16) & ~0xffffff)
    );
}
ideasman42
  • 42,413
  • 44
  • 197
  • 320