2

I'm currently in the middle of writing a simple program in C that takes a command line argument from the user that is a substitution cipher key. Currently, my program is checking for all the possible incorrect arguments the user might enter.

Running this through the debugger results in my program running perfectly every single time, no matter what the input is. However, running my program without the debugger, while also making the argument have at least one lower-case letter (eg. ./substitution VCHPRZGJNTLSKFBDQWAXEuymoi), results in the 'sum' variable having an incorrect value about 50% of the time.

I'll quickly explain what the 'sum' variable is for just so it's easier to understand. 'sum' starts out at 2015, because that's the value of adding up all 26 upper case letters according to the ASCII table. The for loop in my function then subtracts from 2015 each character from the cipher key. Finally, the last if statement checks for whether sum = 0, meaning that only unique characters were entered.

When running this without a debugger, 'sum' is sometimes 0 and sometimes -32 * the number of lower-case letters. For example, VCHPRZGJNTLSKFBDQWAXEuymoi has 4 lower-case letters, and 'sum' is sometimes 0 and sometimes -128.

I've got no idea where this problem is coming from, and I would greatly appreciate any help.

Here is my code:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

string sub(string plaintext,string cipher);

int main(int argc, string argv[])
{
    string cipher = argv[1];
    // checks for whether an argument was provided
    if (argc != 2)
    {
        printf("Usage: ./substitution KEY\n");
        return 1;
    }
    // checks for whether the cipher is 26 characters long
    else if (strlen(cipher) != 26)
    {
        printf("Key must contain 26 characters\n");
        return 1;
    }
    
    // makes sure the cipher contains only non-alphabetical characters
    // also subtracts from 2015 (sum of all unique, upper-case letters) with each iteration to test for whether every character in the argument is unique
    int sum = 2015;
    for (int i = 0; i < strlen(cipher); i++)
    {
        if ((int) cipher[i] > 96 && (int) cipher < 123)
        {
            cipher[i] = (char) ((int) cipher[i] - 32);
        }
        
        if ((int) cipher[i] < 65 || ((int) cipher[i] > 90 && (int) cipher[i] < 97) || (int) cipher[i] > 122)
        {
            printf("Key must only contain alphabetical characters.\n");
            return 1;
            break;
        }
        sum -= ((int) cipher[i]);
    }
    
    // DEBUG: prints 'sum'
    printf("%i\n", sum);
    
    // THIS IS THE PROBLEM STATEMENT
    // determines whether every character in the argument is unique
    if (sum != 0)
    {   
        printf("Key must contain all unique characters.\n");
        return 1;
    }
    
    // gets the plaintext from the user and prints out the cipher text using the 'sub' function
    string plaintext = get_string("plaintext: ");
    printf("ciphertext: %s\n",sub(plaintext,cipher));
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
bmorozov
  • 41
  • 4
  • 2
    "Key must only contain alphabetical characters."? Use `isalpha()` and make your code a lot simpler. – Andrew Henle Jan 22 '21 at 21:22
  • 4
    Re “Finally, the last if statement checks for whether sum = 0, meaning that only unique characters were entered”: That inference is incorrect. `AAAAAAAAAAAAAZZZZZZZZZZZZZ` has the same sum as `ABCDEFGHIJKLMNOPQRSTUVWXYZ`, so comparing their sums does not distinguish them. Rule of thumb: A statistic with one degree of freedom, such as a sum, cannot distinguish sets with 26 degrees of freedom, such as a string of 26 letters. – Eric Postpischil Jan 22 '21 at 21:22
  • 2
    Do not use constants like `96`, `123`, `65`, `90`, or `32` in source code. Use `'a'`, `'z'`, `'A'`, `'Z'`, and `' '` (and change `<` to `<=` and such as necessary). – Eric Postpischil Jan 22 '21 at 21:25
  • 2
    How can you know what `sum` is when the program is run without a debugger? You say sometimes it is −32 times the number of lowercase letters, but, when `sum` is not zero, the code in the question prints “Key must contain all unique characters.” and returns (which terminates the program). It never reaches the `printf` that prints `sum`. And, since this happens when it is not running in the debugger, you cannot see the value in the debugger. So how did you see a value that was not zero? This suggests you were not running the code shown in the question. Show the actual code, **exactly**. – Eric Postpischil Jan 22 '21 at 21:34
  • Hey! Thank you for the responses. I'm entirely new to C (and programming in general), so that should explain some quirks in my code. I did accidentally move the print statement that prints out sum. It should be located before the last if statement (fixed). Also, thank you for pointing out the error in my logic with regards to using a sum for this program at all. However, I am still interested in why this bug would occur in the first place. – bmorozov Jan 22 '21 at 22:09
  • 3
    `if (... && (int) cipher < 123` should read `if (... && (int) cipher**[i]** < 123` – Olaf Dietsche Jan 22 '21 at 22:38
  • Excuse me, is `string argv[]` actually valid? – Mike Nakis Jan 22 '21 at 23:02
  • @OlafDietsche Thank you! I'm still not sure why this worked, but I'll look into it on my own time. – bmorozov Jan 22 '21 at 23:09
  • @MikeNakis Yes, that works for me. Is there a reason why it shouldn't? – bmorozov Jan 22 '21 at 23:09
  • Oh, "works for me" means nothing. I am just asking those who know more than me. – Mike Nakis Jan 22 '21 at 23:12
  • 1
    @MikeNakis Since this is tagged [cs50](https://stackoverflow.com/tags/cs50/info), I'd guess this is a `typedef` in cs50.h. – Olaf Dietsche Jan 23 '21 at 10:23
  • @OlafDietsche the C file includes `string.h`, so I would assume it is the `string` typedef found in `string.h`. I find it hard to believe that `string` is simply typedefed as `char*`, because that would add virtually no value. Then again, further down `strlen()` is used on such a string, and if I remember correctly C does not allow function overloading, so the `strlen()` in question must be the standard one, therefore `string` must in fact be a typedef of `char*`. – Mike Nakis Jan 23 '21 at 12:42
  • So, `string argv[]` seems to be valid. – Mike Nakis Jan 23 '21 at 12:43
  • @MikeNakis: There is no `string` typedef in the standard ``. The code in the question contains `#include `. There is a copy of `cs50.h` [here](https://github.com/cs50/libcs50/blob/develop/src/cs50.h), inside the libcs50 project of [this GitHub page](https://github.com/cs50). It contains `typedef char *string;`. – Eric Postpischil Jan 23 '21 at 13:04

2 Answers2

5

Your OS is probably using ASLR for preventing exploitation of memory corruption vulnerabilities. This might be the reason why the value of the sum variable has an incorrect value about 50% of the time. As spotted in this comment a pointer is compared to an integer in if ((int) cipher[i] > 96 && (int) cipher < 123), which might be the problem. However, ASLR seems to get disabled by your debugger and the addresses get not randomized. Therefore, the address pointed by the variable cipher is not randomized and you always get the same result.

If you are using gdb you can enable ASLR with the command set disable-randomization off.

Edit As suggested by Eric Postpischil in his comments [1, 2] you should use 'a' instead of constants like 97 and the (int) casts should be entirely removed. The (int) casts are unnecessary, since the char elements of the array will be promoted to int.

You should also see the compiler warnings.

For

if ((int) cipher[i] > 96 && (int) cipher < 123)

I get the following warning:

warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
if ((int) cipher[i] > 96 && (int) cipher < 123) {

On gcc -Wpointer-to-int-cast should be enabled by default. Otherwise, you can also also try the -Wall option.

Antonio
  • 181
  • 4
  • 1
    OP should probably also be advised to remove the `(int)` casts entirely. They are unnecessary, since the `char` array elements will be promoted to `int`, and they hid this error. – Eric Postpischil Jan 22 '21 at 23:29
2

Unrelated, but for this kind of comparison, there's ctype.h

This changes the loop to

for (int i = 0; i < strlen(cipher); i++)
{
    if (islower(cipher[i]))
    {
        cipher[i] = toupper(cipher[i]);
    }

    if (!isalpha(cipher[i]))
    {
        printf("Key must only contain alphabetical characters.\n");
        return 1;
        /* break; */
    }

    sum -= (int) cipher[i];
}

This would reduce code length, make the code more readable, avoid constants, and avoid some of the pitfalls. Another point, the break isn't needed here, because you already return from the function. So the break is never reached.

Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198