Can you help to find some mistake in the code?

Question

When I try this code I get Process returned -1073741819 (0xFFFF FFFF C000 0005).
I need to compute the Frequency of a Character in a text. I think that problem with the array. What do you think?

#include <bits/stdc++.h>

int main() {
   setlocale(LC_ALL, "Russian");
   int freq[256];           
   std::ifstream inFile;   
   char ch;

   inFile.open("abc.txt");

   for (int k = 0; k < 256; k++) {
            freq[k] = 0;
   }

   ch = inFile.get();
   while (ch != EOF) {
      ch = toupper(ch);
      freq[ch]++;
      ch = inFile.get();
   }
   // Print the output table
   std::cout << "Letter frequencies in this file are as follows." << std::endl;
   for (char ch = 'А'; ch <= 'Я'; ch++) {
       std::cout << ch << ": " << freq[ch] << std::endl;
   }
   return 0;
}

`char` is often a signed type, which means ASCII codes beyond 127 are negative values. `freq[ch]` exhibits undefined behavior for such a character. Try `freq[unsigned(ch)]`. Alternatively, use `int` as the type of `ch`; note that `inFile.get()` actually returns an `int`, not a `char`. — Igor Tandetnik, Sep 05 '21 at 16:13
@Joe, I can;t upvote you enough. So many questions here, especially from new users, could be solved by a debugger — Mawg says reinstate Monica, Sep 05 '21 at 16:16
`C000 0005` is an access violation, similar to a segmentation fault on Linux. You'll need to look for invalid memory accesses. — nanofarad, Sep 05 '21 at 16:18
Check to see if the characters you read are signed chars or unsigned chars using a debugger. — netlemon, Sep 05 '21 at 16:24

score -1 · Answer 1 · 2021-09-05T18:48:11.983

-1

Your problem is this:

setlocale(LC_ALL, "Russian");
int freq[256];

In this case you are using UTF-8 if I am not mistaking, which would cause you to segfault, since char ch; is 1 byte and UTF-8 is 8 iirc, but it is more than 1 byte.

EDIT 1: Per this answer, whcar is 4 bytes on linux and 16 bits on windows. I use Linux so. But not to repeat here read this.

You can use int.

EDIT 2: TIL that MS has its own encoding for Cyrillic, which uses 8 bits, thus you might be having negative indexes. Thus you are better using unsigned char than or int.

unsigned char ch;

while (inFile.get(ch)) {
    ch = toupper(ch);
    freq[ch]++;
}

This was you avoid reading in EOF characters. You can use int but I would still use this:

int ch;

while (inFile.get(ch)) {
    ch = toupper(ch);
    freq[ch]++;
}

edited Sep 05 '21 at 18:48

answered Sep 05 '21 at 16:23

2

I think it's because char can be signed meaning we would access a negative index on the freq array in this: `freq[ch]++;` and `std::cout << ch << ": " << freq[ch] << std::endl;` – drescherjm Sep 05 '21 at 16:30
3

This is not UTF8. Windows-1251 perhaps, or CP866 (I forgot which one is the default and which is enabled with this locale). – HolyBlackCat Sep 05 '21 at 16:31
1

Now I have somhing like: ENCODING CHANGED. The saved document contained characters which were illegal in the selected encoding. The file's encoding has been changed to UTF-8 to prevent you from losing data. – Sep 05 '21 at 16:42
If I check text in English it works. But can I do it with Cyrillic? – Sep 05 '21 at 16:54

Can you help to find some mistake in the code?

1 Answers1