I need to read UTF-8 characters from a text file and process them.
for instance to calculate the frequency of occurrence of a certain character. Ordinary characters are fine. The problem occurs with characters like ü
or ğ
.
following is my code to check if a certain character occurs comparing the ascii code of the incoming character:
FILE * fin;
FILE * fout;
wchar_t c;
fin=fopen ("input.txt","r");
fout=fopen("out.txt","w");
int frequency = 0;
while((c=fgetwc(fin))!=WEOF)
{
if(c == SOME_NUMBER){ frequency++; }
}
SOME_NUMBER
is what I can't figure out for those characters. Infact those characters print out 5 different numbers when trying to print it as a decimal.
whereas for example for character 'a'
I would do as: if(c == 97){ frequency++; }
since the ascii code of 'a'
is 97
.
Is there anyway that I could identify those special characters in C?
P.S. working with ordinary char ( not wchar_t
) creates the same problem, but this time printing the decimal equivalent of the incoming character would print 5 different NEGATIVE numbers for those special characters. Problem stands.