1

I wrote a program that runs _mm_cmpistri to get the next \n (newline) character. While this works great on my computer, it fails on a server due to missing SSE 4.2 support.

Is there a good alternative using SSE commands <= SSE 4.1?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
moo
  • 486
  • 8
  • 22
  • You have to rewrite little bit but why not _mm_cmpeq_epi8? Well...if you don't have SSE4...you may even go back to a plain optimized C implementation... – Adriano Repetti Apr 10 '14 at 21:22
  • 1
    `pmovmskb` and `bsf` are useful for this – harold Apr 10 '14 at 21:23
  • Could you perhaps provide me some sample code? Here is my current code: `__m128i special = _mm_set_epi8('\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n'); __m128i data = _mm_loadu_si128 (curr); int32_t index = _mm_cmpistri(special, data, _SIDD_CMP_EQUAL_ANY);` – moo Apr 10 '14 at 21:30

1 Answers1

2

Ok, actual code it is. This hasn't been tested, it's just to give you the idea.

__m128i lf = _mm_set1_epi8('\n');
// unaligned part
__m128i data = _mm_loadu_si128((__m128i *)ptr);
int mask = _mm_movemask_epi8(_mm_cmpeq_epi8(data, lf));
if (mask != 0)
    return ffs(mask);
int index = 16 - ((size_t)ptr & 15);
// aligned part, possibly overlaps unaligned part but that's ok
for (; index < length; index += 16) {
    data = _mm_load_si128((__m128i *)(ptr + index));
    mask = _mm_movemask_epi8(_mm_cmpeq_epi8(data, lf));
    if (mask != 0)
        return index + ffs(mask);
}

For MSVC, ffs can be defined in terms of _BitScanForward.

harold
  • 61,398
  • 6
  • 86
  • 164