This is from twitter doc: https://developer.twitter.com/en/docs/basics/counting-characters.html
"Twitter counts the length of a Tweet using the Normalization Form C (NFC) version of the text ... Twitter also counts the number of codepoints in the text rather than UTF-8 bytes."
It works for Western languages. But when I apply FormC normalization to the following, for example:
(I posted an example in Korean, but stackoverflow considers it spam and doesn't let me post it)
I get the value of 160. On Twitter's Web client, this is the maximum available message - adding even one character goes over the limit.
Applying FormD to the above gets a value over 300.
Since Twitter limit is either 140 or 280, I really don't understand how that message's char count is determined by Twitter.
So - how in the world can I figure out what the actual message length is for non-Western languages for a tweet?
The code to normalize, in c#:
private static int GetCodepointLength(string inp)
{
var info = new StringInfo(inp.Normalize(NormalizationForm.FormC));
return info.LengthInTextElements;
}