Can ICU perform collation comparisons on UTF-16LE data on big endian machines directly?

Question

I have the following code:

UCharIterator iter1;
UCharIterator iter2;
UErrorCode status = U_ZERO_ERROR;

if (ENC_UTF16_BE == m_encoding)
{
    uiter_setUTF16BE(&iter1, reinterpret_cast<const char*>(in_string1), in_length1);
    uiter_setUTF16BE(&iter2, reinterpret_cast<const char*>(in_string2), in_length2);

    return ucol_strcollIter(m_collator, &iter1, &iter2, &status);
}
else if (ENC_UTF8 == m_encoding)
{
    uiter_setUTF8(&iter1, reinterpret_cast<const char*>(in_string1), in_length1);
    uiter_setUTF8(&iter2, reinterpret_cast<const char*>(in_string2), in_length2);

    return ucol_strcollIter(m_collator, &iter1, &iter2, &status);
}
else
{
    UnicodeString s1(reinterpret_cast<const char*>(in_string1), in_length1);
    UnicodeString s2(reinterpret_cast<const char*>(in_string2), in_length2);

    return ucol_strcoll(m_collator, s1.getBuffer(), s1.length(), s2.getBuffer(), s2.length());
}

Now, this follows the 'happy path' where the encoding of the data matches ICU's internal encoding, which, on little-endian systems, is UTF16-LE.

But, if this were compiled on a big-endian system, and the encoding was UTF16-LE, we would be forced to go to the 'general' case, which involves creating a UnicodeString object, along with the implied conversion.

It seems like there should be a uiter_setUTF16LE function for this case, but there isn't? Is this an artifact of ICU being always UTF16-LE internally in the far past? Is there another way of doing this, or am I forced to copy/convert?

It looks like I could implement my own 'subclass' of UCharIterator to do this. It seems unfortunate that I would need to do this for something which seems like a relatively common case. If no one replies with a better idea, I'll answer this myself with that. — Bwmat, May 30 '14 at 16:55

score 0 · Accepted Answer · answered Aug 01 '14 at 02:00

0

It looks like I could implement my own 'subclass' of UCharIterator to do this. It seems unfortunate that I would need to do this for something which seems like a relatively common case.

answered Aug 01 '14 at 02:00

Bwmat

4,314
3
27
42

Can ICU perform collation comparisons on UTF-16LE data on big endian machines directly?

1 Answers1