Due to program requirements (fast access to individual characters), I am using uint32_t[]
for characters. Simply stores code points, not UTF-32 code units. because I don't think UTF-32 code-unit and Unicode code-point is same thing, so I have to keep them separated.
The code points are taken from next32PostInc
function
And I need to encode these code-points into UTF-8 chunk using libICU
, and it's hard to find character level accumulative encoder. I see a way by using UnicodeString::append()
, but it needs double conversions. ucnv_convert
functions seems to do the job, but only with UTF-32 code units. And I really am not sure about safety if I use them with code points. Currently I am looking for something inverse of next32PostInc
function. How can I do that? If my idea on code-point and code-units, please correct me.