Most efficient way to convert vector of float to vector of uint32?

Question

This a follow-up question to this one. Now I would like to convert in the opposite direction float --> unsigned int. What is the optimal and accurate vector sequence of the following scalar operation?

float x = ...
unsigned int res = (unsigned int)x;

http://stackoverflow.com/questions/78619/what-is-the-fastest-way-to-convert-float-to-int-on-x86 — Anycorn, Feb 06 '12 at 08:34
Not the same question! I want to convert to an **unsigned** int — zr., Feb 06 '12 at 08:36
What do you mean by "vector sequence"? sse on x86? Intrinsics? — Gunther Piez, Feb 06 '12 at 09:01

Paul R · Accepted Answer · 2012-02-06T09:42:20.203

This is based on an example from the old but useful Apple AltiVec-SSE migration documentation which unfortunately is now no longer available at http://developer.apple.com:

inline __m128i _mm_ctu_ps(const __m128 f)
{
    const __m128 two31 = _mm_set1_ps(0x1.0p31f);
    const __m128 two32 = _mm_add_ps(two31, two31);
    const __m128 zero = _mm_xor_ps(f,f);

    // check for overflow before conversion to int
    const __m128 overflow = _mm_cmpge_ps(f, two31);
    const __m128 overflow2 = _mm_cmpge_ps(f, two32);
    const __m128 subval = _mm_and_ps(overflow, two31);
    const __m128i addval = _mm_slli_epi32((__m128i)overflow, 31);
    __m128i result;

    // bias the value to signed space if it is >= 2**31
    f = _mm_sub_ps(f, subval);

    // clip at zero
    f = _mm_max_ps(f, zero);

    // convert to int with saturation
    result = _mm_cvtps_epi32(f); // rounding mode should be round to nearest

    // unbias
    result = _mm_add_epi32(result, addval);

    // patch up the overflow case
    result = _mm_or_si128(result, (__m128i)overflow2);

    return result;
}

Most efficient way to convert vector of float to vector of uint32?

1 Answers1