You have two misunderstandings you are struggling with. First scanf()
does not modify the storage in any way (omitting for purposes of discussion the non-standard "%a"
, later renamed "%m"
specifiers). Second, you are forgetting to provide length + 1
characters of storage to ensure room for the null-terminating character.
In your statement "For instance if they write "James" I'd need to malloc((sizeof(char)*5)
" - no, no you would need malloc (6)
to provide room for James\0
. Note also that sizeof (char)
is defined as 1
and should be omitted.
As to how to read a string, you generally want to avoid scanf()
and even when using scanf()
unless you are reading whitespace separated words, you don't want to use the "%s"
conversion specifier which stops reading as soon as it encounters whitespace making it impossible to read "James Bond"
. Further, you have the issue of what is left unread in stdin
after your call to scanf()
.
When reading using "%s"
the '\n'
character is left in stdin
unread. This is a pitfall that will bite you on your next attempted read if using an input function that does not ignore leading whitespace (that is any character-oriented or line-oriented input function). These pitfalls, along with a host of others associated with scanf()
use are why new C programmers are encourage to use fgets()
to read user input.
With a sufficiently sized buffer (and if not, with a simple loop) fgets()
will consume an entire line of input each time it is called, ensuring there is nothing left unread in that line. The only caveat is that fgets()
reads and includes the trailing '\n'
in the buffer it fills. You simply trim the trailing newline with a call to strcspn()
(which can also provide you with the length of the string at the same time)
As mentioned above, one approach to solve the "I don't know how many characters I have?" problem is to use a fixed-size buffer (character array) and then repeatedly call fgets()
until the '\n'
is found in the array. That way you can allocate final storage for the line by determining the number of the character read into the fixed-size buffer. It doesn't matter if your fixed-size buffer is 10
and you have 100
characters to read, you simply call fgets()
in a loop until the number of characters you read is less than a full fixed-size buffer's worth.
Now ideally, you would size your temporary fixed-size buffer so that your input fits the first time eliminating the need to loop and reallocate, but if the cat steps on the keyboard -- you are covered.
Let's look at an example, similar in function to the CS50 get_string()
function. It allows the user to provide the prompt for the user, and reads and allocated storage for the result, returning a pointer to the allocated block containing the string that the user is then responsible for calling free()
on when done with it.
#define MAXC 1024 /* if you need a constant, #define one (or more) */
char *getstr (const char *prompt)
{
char tmp[MAXC], *s = NULL; /* fixed size buf, ptr to allocate */
size_t n = 0, used = 0; /* length and total length */
if (prompt) /* prompt if not NULL */
fputs (prompt, stdout);
while (1) { /* loop continually */
if (!fgets (tmp, sizeof tmp, stdin)) /* read into tmp */
return s;
tmp[(n = strcspn (tmp, "\n"))] = 0; /* trim \n, save length */
if (!n) /* if empty-str, break */
break;
void *tmpptr = realloc (s, used + n + 1); /* always realloc to temp pointer */
if (!tmpptr) { /* validate every allocation */
perror ("realloc-getstr()");
return s;
}
s = tmpptr; /* assign new block to s */
memcpy (s + used, tmp, n + 1); /* copy tmp to s with \0 */
used += n; /* update total length */
if (n + 1 < sizeof tmp) /* if tmp not full, break */
break;
}
return s; /* return allocated string, caller responsible for calling free */
}
Above, a fixed size buffer of MAXC
characters is used to read input from the user. A continual loop calls fgets()
to read the input into the buffer tmp
. strcspn()
is called as the index to tmp
to find the number of characters that does not include the '\n'
character (the length of the input without the '\n'
) and nul-terminates the string at that length overwriting the '\n'
character with the nul-terminating character '\0'
(which is just plain old ASCII 0
). The length is saved in n
. If the line is empty after the removal of the '\n'
there is nothing more to do and the function returns whatever is in s
at that time.
If characters are present, the a temporary pointer is used to realloc()
storage for the new characters (+1
). After validating realloc()
succeeded, the new characters are copied to the end of the storage and the total length of characters in the buffer is saved in used
which is used as an offset from the beginning of the string. That repeats until you run out of characters to read and the allocated block containing the string is returned (if no characters were input, NULL
is returned)
(note: you may also want to pass a pointer to size_t
as a parameter that can be updated to the final length before return to avoid having to calculate the length of the returned string again -- that is left to you)
Before looking at an example, let's add debug output to the function so it tells us how many characters were allocated in total. Just add the printf()
below before the return, e.g.
}
printf (" allocated: %zu\n", used?used+1:used); /* (debug output of alloc size) */
return s; /* return allocated string, caller responsible for calling free */
}
A short example that loops reading input until Enter is pressed on an empty line causing the program to exit after freeing all memory:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* insert getstr() function HERE */
int main (void) {
for (;;) {
char *s = getstr ("enter str: ");
if (!s)
break;
puts (s);
putchar ('\n');
free (s);
}
}
Example Use/Output
With MAXC
at 1024
there isn't a chance of needing to loop unless the cat steps on the keyboard, so all input is read into tmp
and then storage is allocated to exactly hold each input:
$ ./bin/fgetsstr
enter str: a
allocated: 2
a
enter str: ab
allocated: 3
ab
enter str: abc
allocated: 4
abc
enter str: 123456789
allocated: 10
123456789
enter str:
allocated: 0
Setting MAXC
at 2
or 10
is fine as well. The only thing that changes is the number of times you loop reallocating storage and copying the contents of the temporary buffer to your final storage. E.g. with MAXC
at 10
, the user wouldn't know the difference in:
$ ./bin/fgetsstr
enter str: 12345678
allocated: 9
12345678
enter str: 123456789
allocated: 10
123456789
enter str: 1234567890
allocated: 11
1234567890
enter str: 12345678901234567890
allocated: 21
12345678901234567890
enter str:
allocated: 0
Above you have forced the while (1)
loop to execute twice for each string of 10
characters or more. So while you want to set MAXC
to some reasonable size to avoid looping, and a 1K buffer is fine considering you will have at minimum a 1M function stack on most x86 or x86_64 computers. You may want to reduce the size if you are programming for a micro-controller with limited storage.
While you could allocate for tmp
as well, there really is no need and using a fixed-size buffer is about a simple as it gets for sticking with standard-C. If you have POSIX available, then getline()
already provides auto-allocation for any size input you have. That is another good alternative to fgets()
-- but POSIX is not standard C (though it is widely available)
Another good alternative is simply looping with getchar()
reading a character at a time until the '\n'
or EOF
is reached. Here you just allocate some initial size for s
say 2
or 8
and keep track of the number of characters used
and then double the size of the allocation when used == allocated
and keep going. You would want to allocate blocks of storage as you would not want to realloc()
for every character added (we will omit the discussion of why that is less true today with a mmap
ed malloc()
than it was in the past)
Look things over and let me know if you have further questions.