Why do I get negative values for my ciphertext?

Question

This is a problem set from CS50x. It's a Caesar cipher. It needs to shift all alphabetic letters (uppercase and lowercase) by the key which is input during run-time. It should preserve case, symbols, and numbers.

I thought the code was correct, but I kept getting negative ASCII values for larger key values. On paper/mathematically it shouldn't give negative values. There's something that I am not understanding.

// Caesar cipher//
#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

//cipher key//
int K[];

int main(int argc, string argv[])
{

//checking for only one command-line argument//
    if(argc == 2)
    {

        //checking if key is digit//
          for(int i = 0, n=strlen(argv[1]); i < n; i++)
          {
              K[i] = (argv[1][i]-'0');
              if((K[i] < 0) || (K[i] > 9 ))
            {
                printf("Usage: ./caesar key\n");
                return 1;
                break;
            }
          }


          string p = get_string("plaintext: ");
          printf("ciphertext: ");
          char c[strlen(p)];
          for(int i = 0, n = strlen(p); i < n; i++)
          {

              if(((p[i] > 64) && (p[i] < 91)) || ((p[i] > 96) && (p[i] < 123)))
              {

                 int k = atoi(argv[1]);

                 c[i] = (p[i]+(k % 26));



                 if (c[i] > 122)
                 {
                     c[i] =  (c[i] % 122) + 96;
                 }

                 else if ((c[i] > 90) && (c[i] < 97))
                 {
                     c[i] = (c[i] % 90) + 64;
                 }


                 else
                 {
                     c[i] =  c[i];
                 }
              }
              else
              {
               c[i] = p[i];
              }
             printf("%c",c[i]);
          }

          printf("\n");

    }

    else
    {
      printf("Usage: ./caesar key\n");
      return 1;
    }


}

For example a key of 100 and a plaintext of "z" should give me a value of 118 (v), but I get -112 (which doesn't exist in ASCII).

This declaration int K[]; declares an incomplete type. So the program has undefined behavior. — Vlad from Moscow, Aug 09 '19 at 13:46
`c` is a signed char, so can have negative values...but looks more like the problem is your maths — Chris Turner, Aug 09 '19 at 13:48
@Ajanth Kumarakuruparan And do not use magic numbers as 64 or 91. — Vlad from Moscow, Aug 09 '19 at 13:51
@VladfromMoscow, let's say I edit the code and gave the array K[] a size of K[100]. I still get the the same negative values as before for that input. So I guess that's not the cause of the problem. — Ajanth Kumarakuruparan, Aug 09 '19 at 13:58
@ChrisTurner the math looks correct on paper to me. Can you identify the problem in the math ? — Ajanth Kumarakuruparan, Aug 09 '19 at 13:59
@VladfromMoscow: It constitutes a tentative declaration which will be assumed to have one element. Per C 2018 6.9.2 2 “If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.” — Eric Postpischil, Aug 09 '19 at 14:04
(a) How do you know you get −112? The code shown in the question does not print the numeric value of the character. (b) Is the code shown in the question **exactly** the same code that gives −112? (c) With a key of 1 and a plaintext of “z”, why do you expect “v”? One character beyond “z”, with wrapping, would be “a”. — Eric Postpischil, Aug 09 '19 at 14:29
@AjanthKumarakuruparan, the math is rubbish. The `% 122` and `% 90` operations are completely wrong for addressing the issue that I suppose you must intend them for. Moreover, the code does not effectively protect against integer overflow, which is presumably the source of your observed negative values. There appear to be other issues, too. — John Bollinger, Aug 09 '19 at 14:29
@EricPostpischil (a) It is the exact same code i just changed %c to %d to debug the problem. And then I get -112. (c) Yes I am very sorry I made a mistake I meant a key of 100. — Ajanth Kumarakuruparan, Aug 09 '19 at 15:03
@JohnBollinger: The `% 122` and `% 90` operations are inefficient and unaesthetic, but they do not function incorrectly for their intended purpose. Given a value of `c[i]` that had been made into, say, 92, `c[i] = (c[i] % 90) + 64;` would set it to 66, as desired. — Eric Postpischil, Aug 09 '19 at 15:19

John Bollinger · Answer 1 · 2019-08-09T15:09:00.600

The math is all wrong, and your negative values are arising from assigning out-of-range values to array elements of type char.

Supposing that you want to rotate a capital Latin letter stored in p[i] by a user-specified key value k in the range 0 ... 25, and you are willing to assume that the capital Latin letters are encoded as consecutive, lexical-order values (as they are in ASCII, but not in some other encodings), the computation would have this form:

c[i] = (((p[i] - 'A') + k) % 26) + 'A';

Initially subtracting the value of 'A' produces a result in the range 0 ... 25 (subject to the assumptions given above). A rotation within that range is effected by adding the key value and reducing the sum modulo 26. Finally, adding 'A' brings the rotated result back into range of capital Latin letters. This approach or something equivalent to it is what you need for a sum-and-modulus style rotation, and the math in the program in the question is not equivalent.

Note well that, again subject to the specified assumptions, no intermediate result can even be outside the range of type char (which isn't directly relevant in this case), much less is any such value actually assigned to a char variable (which is relevant and important).

The computation for lowercase letters is analogous (under analogous assumptions), but not identical.

Although the math used is cumbersome and unnecessary, it is not wrong. Simply inserting `unsigned` in one place will make the program perform as intended. — Eric Postpischil, Aug 09 '19 at 15:22
@EricPostpischil Yes, I agree the math is very cumbersome and could be better but I couldn't find it to be logically wrong. — Ajanth Kumarakuruparan, Aug 10 '19 at 07:39
Yes, @AjanthKumarakuruparan, I have declined to write a complete solution for you, since it's *your* homework. As I said, the computation for lowercase is analogous to what I've already presented. And also of course, keys larger than 25 can be used by first reducing them modulo 26 -- or even directly, provided that they are not too near the `INT_MAX`. — John Bollinger, Aug 12 '19 at 10:42

score 1 · Accepted Answer · answered Aug 09 '19 at 15:17

The error is in the fact that char is signed in your C implementation and in this line:

c[i] = (p[i]+(k % 26));

When p[i] + k % 26 (no parentheses are needed) exceeds 127, the result does not fit in a char in your implementation, and it is converted to char in an implementation-defined way, likely by wrapping modulo 256 (equivalently, using the low eight bits). Thus, with a key of 100 and the character 'z' with value 122, the result is 122 + 100 % 26 = 122 + 22 + 144, which gets stored in c[i] as −112.

This can be easily fixed by changing:

char c[strlen(p)];

to:

unsigned char c[strlen(p)];

Thank you very much, that was exactly the answer I was expecting because even thou the math was very cumbersome I couldn't find it to be incorrect. — Ajanth Kumarakuruparan, Aug 10 '19 at 07:35

Why do I get negative values for my ciphertext?

2 Answers2