The code below converts a row from an 8-Bit paletized format to 32-RGBA.
Before I trying to implement it, I would like to know if the code below is even suited for being optimized with Direct-Math or alternatively ARM Neon intrinsics or inline assembly. My first look at the documentation did not reveal anything that would cover the table-lookup part.
void CopyPixels(BYTE *pDst, BYTE *pSrc, int width,
const BYTE mask, Color* pColorTable)
{
if (width)
{
do
{
BYTE b = *pSrc++;
if (b != mask)
{
// Translate to 32-bit RGB value if not masked
const Color* pColor = pColorTable + b;
pDst[0] = pColor->Blue;
pDst[1] = pColor->Green;
pDst[2] = pColor->Red;
pDst[3] = 0xFF;
}
// Skip to next pixel
pDst += 4;
}
while (--width);
}
}