1

When I try this code I get Process returned -1073741819 (0xFFFF FFFF C000 0005).
I need to compute the Frequency of a Character in a text. I think that problem with the array. What do you think?

#include <bits/stdc++.h>

int main() {
   setlocale(LC_ALL, "Russian");
   int freq[256];           
   std::ifstream inFile;   
   char ch;

   inFile.open("abc.txt");

   for (int k = 0; k < 256; k++) {
            freq[k] = 0;
   }

   ch = inFile.get();
   while (ch != EOF) {
      ch = toupper(ch);
      freq[ch]++;
      ch = inFile.get();
   }
   // Print the output table
   std::cout << "Letter frequencies in this file are as follows." << std::endl;
   for (char ch = 'А'; ch <= 'Я'; ch++) {
       std::cout << ch << ": " << freq[ch] << std::endl;
   }
   return 0;
}
Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • 5
    Are you using a compiler with a debugger? –  Sep 05 '21 at 16:11
  • 7
    `char` is often a signed type, which means ASCII codes beyond 127 are negative values. `freq[ch]` exhibits undefined behavior for such a character. Try `freq[unsigned(ch)]`. Alternatively, use `int` as the type of `ch`; note that `inFile.get()` actually returns an `int`, not a `char`. – Igor Tandetnik Sep 05 '21 at 16:13
  • 2
    @Joe, I can;t upvote you enough. So many questions here, especially from new users, could be solved by a debugger – Mawg says reinstate Monica Sep 05 '21 at 16:16
  • 2
    `C000 0005` is an access violation, similar to a segmentation fault on Linux. You'll need to look for invalid memory accesses. – nanofarad Sep 05 '21 at 16:18
  • 1
    Check to see if the characters you read are signed chars or unsigned chars using a debugger. – netlemon Sep 05 '21 at 16:24
  • 1
    `int freq[256];` is also uninitialized. – drescherjm Sep 05 '21 at 16:34

1 Answers1

-1

Your problem is this:

setlocale(LC_ALL, "Russian");
int freq[256];   

In this case you are using UTF-8 if I am not mistaking, which would cause you to segfault, since char ch; is 1 byte and UTF-8 is 8 iirc, but it is more than 1 byte.

EDIT 1: Per this answer, whcar is 4 bytes on linux and 16 bits on windows. I use Linux so. But not to repeat here read this.

You can use int.

EDIT 2: TIL that MS has its own encoding for Cyrillic, which uses 8 bits, thus you might be having negative indexes. Thus you are better using unsigned char than or int.

unsigned char ch;

while (inFile.get(ch)) {
    ch = toupper(ch);
    freq[ch]++;
}

This was you avoid reading in EOF characters. You can use int but I would still use this:

int ch;

while (inFile.get(ch)) {
    ch = toupper(ch);
    freq[ch]++;
}
  • 2
    I think it's because char can be signed meaning we would access a negative index on the freq array in this: `freq[ch]++;` and `std::cout << ch << ": " << freq[ch] << std::endl;` – drescherjm Sep 05 '21 at 16:30
  • 3
    This is not UTF8. Windows-1251 perhaps, or CP866 (I forgot which one is the default and which is enabled with this locale). – HolyBlackCat Sep 05 '21 at 16:31
  • 1
    Now I have somhing like: ENCODING CHANGED. The saved document contained characters which were illegal in the selected encoding. The file's encoding has been changed to UTF-8 to prevent you from losing data. –  Sep 05 '21 at 16:42
  • If I check text in English it works. But can I do it with Cyrillic? –  Sep 05 '21 at 16:54