0

I'm dusting off of my C skills for an upcoming class and I came across this weird output with printf after building a string using getchar. Specifically, any string I try to output gets the same sequence of characters appended to each letter. foo becomes "f?8@{?o?8@{?o?8@{?" compiling with cc, and f¿:¿o¿:¿0¿:¿ with Apple LLVM 5.0 (Xcode). Here is the sample code that illustrates the issue:

char * input_buffer = malloc( sizeof( char ) );

char c;
while ( ( c = getchar() ) != '\n' ) {
    strcat(input_buffer, &c);
}

// problem output
printf( "\n%s\n", input_buffer );
// foo -> f¿:¿o¿:¿0¿:¿

// weird side effect is the 4 is required to get a proper len
printf("\ncharacters: %lu\n", strlen( input_buffer ) / 4 );

I've searched everywhere but I'm not seeing this anywhere else, but then this seems like a bit of an edge case. Is this is some kind of an encoding issue that I am not taking into account?

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
Philip Regan
  • 5,005
  • 2
  • 25
  • 39
  • 2
    you are allocating single character by malloc – Dipto Jan 18 '14 at 20:13
  • Minor: Use `printf("%zu", strlen( input_buffer ) / 4 );`. The type returned by `strlen()` is `size_t` and a binary operation (`/`) of `size_t` and `int` will result in a `size_t`. `'z'` is the matching format specification modifier for the unsigned integer type `size_t`. – chux - Reinstate Monica Jan 18 '14 at 20:26
  • Why the downvote? Downvoting without explanation isn't terribly helpful. – Philip Regan Jan 18 '14 at 22:03

3 Answers3

3

You cannot call strcat(input_buffer, &c);.

Each of the arguments passed to strcat must be a valid null-terminated string of characters.

The chances of the next byte after &c being 0 are pretty slim.

The chances of the first byte pointed by input_buffer being 0 aren't very high either.

In other words, strcat reads "junk" until it encounters a 0 character, in both arguments.

Change:

while ( ( c = getchar() ) != '\n' ) {
    strcat(input_buffer, &c);
}

To:

for (int i=0; 1; i++)
{
    c = getchar();
    if (c == '\r' || c == '\n')
    {
        input_buffer[i] = 0;
        break;
    }
    input_buffer[i] = c;
}
barak manos
  • 29,648
  • 10
  • 62
  • 114
  • 3
    You don't need to check for `'\r'`. `stdin` is a text stream, so a line ending (whether it's a UNIX-style LF or a Windows-style CR-LF sequence) is translated to a single `'\n'` newline character. Also, you can omit the `true` in the `for` loop: `for (int i = 0; ; i++)`. (`true` isn't defined anyway unless you've defined it yourself, or you have `#include `, or you're using a C++ compiler to compile C code.) – Keith Thompson Jan 18 '14 at 20:23
  • Are you sure about the '\r'? Had that problem recently (although I was using `getch` from `conio.h`). – barak manos Jan 18 '14 at 20:25
  • @barakmanos: Yes, I'm sure. I don't know how `getch` from `` behaves; I'm talking about standard C I/O functions. – Keith Thompson Jan 18 '14 at 20:30
  • @chux: Hi. What's wrong with it being inside the loop, just before the `break`? – barak manos Jan 18 '14 at 20:31
  • For `strcat` would it make sense to make `c` into a null-terminated string (which seems wasteful to me)? If not, then what to use to append `c` to `input buffer`? – Philip Regan Jan 18 '14 at 22:03
  • You won't be able to "getchar into a null-terminated string". Please see the solution suggested in my answer above. – barak manos Jan 18 '14 at 22:06
2
  • You are allocating space to input_buffer for only one char.
  • strcat(input_buffer, &c); is wrong. You are concatenating character (it is not null terminated) with a string.
  • getchar returns int type but you declared c is of type char.
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    `strcat` does take the address of a `char` as its second argument. The problem is that the `char` needs to be the first character of a string. – Keith Thompson Jan 18 '14 at 20:24
  • According to documentation `getchar()` returns `unsigned char` casted to `int` or `EOF`. So there is nothing really wrong with that except that `EOF` may get out of `char` range. – ony Jan 18 '14 at 20:39
  • 1
    @ony: And that's the problem. By assigning the result of `getchar()` to a `char` object, you lose the ability to detect an end-of-file or error condition. – Keith Thompson Jan 18 '14 at 20:43
  • For `strcat` would it make sense to make `c` into a null-terminated string (which seems wasteful to me)? If not, then what to use to append `c` to `input buffer`? – Philip Regan Jan 18 '14 at 22:02
1
char * input_buffer = malloc( sizeof( char ) );

sizeof (char) is 1 by definition. This allocates space for a single character, and makes input_buffer point to it.

You're also not checking whether the allocation succeeded. malloc returns a null pointer on failure; you should always check for that.

And the allocated char object that input_buffer points to contains garbage.

char c;
while ( ( c = getchar() ) != '\n' ) {
    strcat(input_buffer, &c);
}

getchar() returns an int, not a char. You can assign the result to a char object, but by doing so you lose the ability to detect and end-of-file or error condition. getchar() returns EOF when there are no more characters to be read; you should always check for that, and doing so requires storing the result in an int. (EOF is an integer value that's unequal to any valid character.)

    strcat(input_buffer, &c);

input_buffer points to a single uninitialized char. You can treat it as an array consisting of a single char element. The first argument to strcat must already contain a valid null-terminated string, and it must have enough space to hold that string plus whatever you're appending to it.

c is a single char object, containing whatever character you just read with getchar(). The second argument tostrcatis achar*, so you've got the right type -- but thatchar*` must point to a valid null-terminated string.

strcat will first scan the array pointed to by input_buffer to find the terminating '\0' character so it knows where to start appending -- and it will probably scan into memory that's not part of any object you've declared or allocated, possibly crashing your program. If that doesn't blow up, it will then copy characters starting at c, and going past it into memory that you don't own. You have multiple forms of undefined behavior.

You don't need to use strcat to append a single character to a string; you can just assign it.

Here's a simple example:

char input_buffer[100];
int i = 0; /* index into input_buffer */
int c;
while ((c = getchar()) != '\n' && c != EOF) {
    input_buffer[i] = c;
    i ++;
}
input_buffer[i] = '\0'; /* ensure that it's properly null-terminated */

I allocated a fixed-size buffer rather than using malloc, just for simplicity.

Also for simplicity, I've omitted any check that the input doesn't go past the end of the input buffer. If it does, the program may crash if you're lucky; if you're not lucky, it may just appear to work while clobbering memory that doesn't belong to you. It will work ok if the input line isn't too long. In any real-world program, you'll want to check for this.

BTW, what's being done here is more easily done using fgets() -- but it's good to learn how things work on a slightly lower level.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • All of the examples I found used `char arr[]` for the string, but that requires my knowing how long the longest string would be, and for this exercise I don't. So, why not just use `char *` and what would be used to append to the `char *`? I know at a certain level they are the same thing, but I would think not having to manage the bounds of the array would be better. – Philip Regan Jan 18 '14 at 22:01
  • @PhilipRegan: With a `char*`, you still have to know the maximum length; the difference is that you can determine it at run time. You can use `realloc` to expand the array (actually to replace the array by a larger one) as needed. You'd append single characters the same way. (BTW, arrays and pointers are not the same thing on any level; see section 6 of the [comp.lang.c FAQ](http://www-c.faq.com).) – Keith Thompson Jan 18 '14 at 22:04
  • "...arrays and pointers are not the same thing on any level..." Sorry, I'm not explaining myself very well here. Thanks for the answer. Very helpful. – Philip Regan Jan 18 '14 at 22:45