For combining diacritics, are they counted as letters? Since, as far as I know, they can only combine with other letters in well-formed Unicode.
The ICU function to determine if a Unicode codepoint is a letter only takes one codepoint, so for any given codepoint, it can't know if they've been combined with a diacritic- or if it's a diacritic, what it's been combined with. I'm trying to implement something akin to a Unicode-aware regex, using a construct like
while(is_letter(codepoint))
However, I'm quite concerned about what's going to happen if codepoint
is actually a diacritic, which would be collated with a previous codepoint, and other collating marks.
Is this safe to do? Or will I have to explicitly find and ignore diacritics and other collating marks?
Edit: What I really need to do is iterate characters, not codepoints.
This question is a victim of the XY problem. I need to raise a question about my actual problem.