I have the following code:
UCharIterator iter1;
UCharIterator iter2;
UErrorCode status = U_ZERO_ERROR;
if (ENC_UTF16_BE == m_encoding)
{
uiter_setUTF16BE(&iter1, reinterpret_cast<const char*>(in_string1), in_length1);
uiter_setUTF16BE(&iter2, reinterpret_cast<const char*>(in_string2), in_length2);
return ucol_strcollIter(m_collator, &iter1, &iter2, &status);
}
else if (ENC_UTF8 == m_encoding)
{
uiter_setUTF8(&iter1, reinterpret_cast<const char*>(in_string1), in_length1);
uiter_setUTF8(&iter2, reinterpret_cast<const char*>(in_string2), in_length2);
return ucol_strcollIter(m_collator, &iter1, &iter2, &status);
}
else
{
UnicodeString s1(reinterpret_cast<const char*>(in_string1), in_length1);
UnicodeString s2(reinterpret_cast<const char*>(in_string2), in_length2);
return ucol_strcoll(m_collator, s1.getBuffer(), s1.length(), s2.getBuffer(), s2.length());
}
Now, this follows the 'happy path' where the encoding of the data matches ICU's internal encoding, which, on little-endian systems, is UTF16-LE.
But, if this were compiled on a big-endian system, and the encoding was UTF16-LE, we would be forced to go to the 'general' case, which involves creating a UnicodeString object, along with the implied conversion.
It seems like there should be a uiter_setUTF16LE function for this case, but there isn't? Is this an artifact of ICU being always UTF16-LE internally in the far past? Is there another way of doing this, or am I forced to copy/convert?