-3

My understanding of fscanf:
grabs a line from a file and based on format, stores it to a string.

That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).

Some assumptions:
1. fp is a valid FILE pointer.
2. The file has 1 line in it that reads "Something"

A pointer with allocated memory

char* temp = malloc(sizeof(char) * 1); // points to some small part in mem.
int resp = fscanf(fp,"%s", temp); 
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)

An array with predefined length (it's different from the pointer!)

char temp[100]; // this buffer MUST be big enough, or we get segmentation fault
int resp = fscanf(fp,"%s", temp); 
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)

A null pointer

char* temp; // null pointer
int resp = fscanf(fp,"%s", temp); 
printf("Trying to print: %s\n",temp); // Crashes, segmentation fault

So a few questions have arisen!

  1. How can a pointer with malloc of 1 contain longer texts?
  2. Since the pointer's content doesn't seem to matter, why does a null pointer crash? I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
  3. Why does the pointer work, but an array (char temp[1];) crashes?

Edit:

I'm well aware that you need to pass a big enough buffer to contain the data from the line, I was wondering why it was still working and not crashing in other situations.

iBug
  • 35,554
  • 7
  • 89
  • 134
Patrick
  • 3,289
  • 2
  • 18
  • 31
  • "*Why does a pointer with malloc of 1 can contain longer texts*" it's not the pointer that contains the text, but the memory the pointer's value addresses (points to). – alk Jul 29 '17 at 15:13
  • C have no bound checking. Neither for pointers nor for arrays. Writing out of bounds (which is what happens in the first example) leads to *undefined behavior* and that makes your whole program *ill-formed* and invalid. – Some programmer dude Jul 29 '17 at 15:15
  • The null-pointer points nowhere, you cannot write to nowhere (mostly), hence the crash. A pointer pointing to too few memory, lets you legally write some data and rest is written into the neighbours garden ... likely to crash as well, or sort of ... sometime, you never know... we call it undefined, undefined behaviour. – alk Jul 29 '17 at 15:16
  • Also, if you declare a local variable but don't initialize it, it will *stay* uninitialized. The compiler or the runtime environment will not initialize the variable, its contents will be *indeterminate* and will seem random. That means the `temp` variable in the last example is most likely *not* null, but instead points to some seemingly random location. – Some programmer dude Jul 29 '17 at 15:16
  • Lastly, dereferencing a null pointer *also* leads to undefined behavior. And on most platform writing to address zero (which is what a null pointer on most systems is) leads to a crash. – Some programmer dude Jul 29 '17 at 15:17
  • 1
    A good textbook will explain all these basics. You seem to have watched one of the typical video-"tutorials" on youtube or skipped a lot of lessons in your C book. Don't! Read a good C textbook, don't skip chapters and learn the lesens. Also read the documentation of functions you use. Your bunch of questions already starts with a wrong premise in the ver first paragraph. – too honest for this site Jul 29 '17 at 15:25
  • *"fscanf grabs a line from a file and based on format, stores it to a string."* -- NO. It reads from file only as much data that matches the format passed as its second argument and stores the values identified in the read data into the variables whose addresses are passed as arguments starting with the third. – axiac Jul 29 '17 at 15:26
  • Thank you all for taking the time to comment, I obviously knew something was "wrong" and merely tried to identify it. @Olaf sorry, no youtube or book involved. I'm used to languages that have no books, or that their books are highly opinionated (javascript?), You'll rarely see someone refer anyone to a book in JS, I guess in C it's different. – Patrick Jul 29 '17 at 15:43
  • 1
    @Patrick: C is not a language to learn by trial&error. It is standardised (which you would have noticed reading the tag-wiki) as ISO9899 (current version: 2011). If you want to get without textbook, just read the standard. But as usual they are tough readings if you are not used to their structure and language (nevertheless, the library decumentation is mostly what e.g. POSIX/Linux man-pages show. – too honest for this site Jul 29 '17 at 15:55
  • One of your opening statements is _"grabs a line from a file and based on format, stores it to a string"_ — and that is wrong. The `scanf()` family of functions do not work on, or care about, lines of input. They work on characters and, for the most part, treat all white space (blanks, tabs, newlines) as interchangeable and largely ignorable. So, you need to rest your thinking there. Note, in particular, that [`scanf()` leaves the newline in the input buffer](https://stackoverflow.com/questions/5240789/scanf-leaves-the-new-line-char-in-buffer). – Jonathan Leffler Jul 29 '17 at 22:01
  • Given: `char* temp = malloc(sizeof(char) * 1); // points to some small part in mem` — you cannot read any string into the allocated space; you need at least 2 characters, one for the data and one for the terminal null. You could use a `%c` format. Any use of this `temp` with `scanf()` is broken — undefined behaviour. – Jonathan Leffler Jul 29 '17 at 22:04
  • Your second example is OK, though it would be better if you used `"%99s"` since the array only has 100 characters and `scanf()` doesn't count the null byte at the end. Your third example starts: `char* temp; // null pointer` — this is wholly erroneous. `temp` is a local variable; it has no initializer; it has no determinate value and in general will not be a null pointer (though it may accidentally be a null pointer sometimes). – Jonathan Leffler Jul 29 '17 at 22:06
  • Note that POSIX defines [`scanf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/scanf.html) to support `%ms`, in which case you could use `char *temp; if (fscanf(fp. "%ms", &temp) == 1) { …use and free temp… } else { …temp has no reliable value and cannot be used… }`. Note that `temp` must _not_ point to anything crucial before this call to `fscanf()` because you can't tell whether it'll be overwritten or not. Note that on a Mac (macOS Sierra, and also OS X before it) does not support the `m` modifier. – Jonathan Leffler Jul 29 '17 at 22:09

4 Answers4

2

My understanding of fscanf:

grabs a line from a file and based on format, stores it to a string.

No, that contains some serious and important misconceptions. fscanf() reads from a file as directed by the specified format, so as to assign values to some or all of the objects pointed-to by its third and subsequent arguments. It does not necessarily read a whole line, but on the other hand, it may read more than one.

In your particular usage,

int resp = fscanf(fp,"%s", temp);

, it attempts to skip any leading whitespace, including but not limited to empty and blank lines, then read characters into the pointed-to character array, up to the first whitespace character or the end of the file. Under no circumstance will it consume the line terminator of the line from which it populates the array contents, but it will not even get that far if there is other whitespace on the line following at least one non-whitespace character (though that is not the case in the particular sample input you describe).

That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).

Strings are not an actual data type in C. Arrays of chars are, but such arrays are not "strings" in the C sense unless they contain at least one null character. Furthermore, in that case, C string functions for the most part operate only on the portions of such arrays up to and including the first null, so it is those portions that are best characterized as "strings".

There is more than one way to obtain storage for character sequences that can be considered strings, but there is only one way to pass them around: by means of a pointer to their first character. Whether you obtain storage by declaring a character array, by a string literal, or by allocating memory for it, the contents are accessed only via pointers. Even when you declare a char array and access elements by applying the index operator, [], to the name of the array variable, you are actually still using a pointer to access the contents.

  1. Why does a pointer with malloc of 1 can contain longer texts?

A pointer does not contain anything but itself. It is the space it points to that contains anything else, such as text. If you allocate only one byte, then the allocated space can contain only one byte. If you overrun that one byte by attempting to write a longer character sequence where the pointer points, then you invoke undefined behavior. In particular, C does not guarantee that an error will be generated, or that the program will fail to behave as you expect, but all manner of havoc can ensue, without limit.

  1. Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as well, since it points to a small piece of memory.

Attempting to dereference an invalid pointer, including, but not limited to a null pointer, also produces undefined behavior. A crash is well within the realm of possible behaviors. C does not guarantee a crash in that case, but that's reliably provided by some implementations.

  1. Why does the pointer work, but an array(char temp[1];) crashes?

You do not demonstrate your 1-character array alternative, but again, overrunning the bounds of the object -- in this case an array -- produces undefined behavior. It is undefined so it is not justified to suppose that the behavior would be the same as for overrunning the bounds of an allocated object, or even that either one of those behaviors would be consistent.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
1

That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).

For passing a C-"string" to scanf() & friends there is just one way: Pass it the address of enough valid memory.

If you don't the code would invoke the infamouse Undefined Behaviour, which means anything can happen, from crash to seemingly running fine.

alk
  • 69,737
  • 10
  • 105
  • 255
  • While this summarizes the issue, @dasblinkenlight's answer hit it on the head with the difference as to WHY this is happening. Obviously your answer is right, and it's what I already did before even asking the question here, I was just experimenting with different values and saw an unexpected behavior so I thought I'd ask. I should edit my question to reflect this. – Patrick Jul 29 '17 at 15:46
  • @Patrick: Your question is in fact 1+3 questions: 1 indirect, 3 direct questions. I answered the 1st, the indirect one. It is posed (implicitly) by stating something ("*There are three different ...*") out of the blue. Doing so provokes to object (at least me it does). Which I did by silently (and kindly? ;-) taking your assumption as a question: "Are there three different ..."? "No, there aren't, but just one!" :-) – alk Jul 29 '17 at 16:07
  • 1
    Kindly indeed :) I'm sure you can appreciate why it seems like 3 different ways, as it seems they result in 3 different behaviors, however as it seems in C, behavior observed has little to no relation to what was actually coded. undefined behavior can seem perfectly fine. – Patrick Jul 29 '17 at 16:10
0

Why does a pointer with malloc of 1 can contain longer texts?

In theory, it can't without causing undefined behavior. In practice, however, when you allocate a single byte, the allocator gives you a small chunk of memory of the smallest size it supports, which is usually sufficient for 8..10 characters without causing a crash. The additional memory serves as a "padding" that prevents a crash (but it is still undefined behavior).

Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as well, since it points to a small piece of memory.

Null pointer, on the other hand, is not sufficient even for an empty string, because you need space for null terminator. Hence, it's a guaranteed UB, which manifests itself as a crash on most platforms.

Why does the pointer work, but an array(char temp[1]) crashes?

Because arrays are allocated without any extra "padding" memory after them. Note that a crash is not guaranteed, because the array may be followed by unused bytes of memory, which your string could corrupt without any consequences.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Thank you so much for this clear answer(and answering what was asked!), To be fair I fully understood why my 2nd and 3rd examples aren't(or are) working, but the malloc 1 pointer honestly confused me! – Patrick Jul 29 '17 at 15:39
0

Because null pointers aren't allocated with memory.

When you request for a small piece of memory, it is allocated from a block of memory called "heap". The heap is always allocated and freed in units of blocks or pages, which will always be a little larger than a few bytes, usually several KBs.

So when you allocate memory with new or by defining an array (small), you get a piece of memory in the heap. The actually available space is larger and can (often) go over the amount you requested, so it's practically safe to write (and read) more than requested. But theoretically, it's an UB and should make the program crash.

When you create a null pointer, it points to 0, an invalid address that can't be read from or written to. So it's guaranteed that the program will crash, often by a segmentation fault.

Small arrays may crash more often than new and malloc because they aren't always allocated from heap, and may come without any extra space after them, so it's more dangerous to write over the limit. However they're often preceding unused (unallocated) memory areas, so sometimes your program may not crash, but gets corrupted data instead.

iBug
  • 35,554
  • 7
  • 89
  • 134
  • Thanks for your answer, this seems to fail when you initialize a string of length 2 (`char temp[2]`) - Am I right to assume that this is assigned from the stack, and not the heap? so memory is allocated in different chunks from those two areas? – Patrick Jul 29 '17 at 15:50
  • I'm not sure but I have another guess: memory alignment. When you init a `char [2]`, it may fall into small gaps between other 16-bit datas, for example a `short` and a `long`. Then the data after the array changed and the string is broken. – iBug Jul 29 '17 at 15:54
  • "The heap is always allocated and freed in units of blocks or pages, which will always be a little larger than a few bytes, usually several KBs." - Please provide a reference to the standard. C does not have `new`. Static variables are typically not allocated on the heap. etc. Null pointers don't point anywhere. Their value is not neccessarily all-bits-zero address `0x0` can be a valid address. "Crash" is not guaranteed, nor is SEGFAULTing. etc. Your text is full of unclear, implementation-specific and even wrong information. – too honest for this site Jul 29 '17 at 16:02