2

As you can see, I only allocated 1 byte as sizeof(char) inside the loop, and still sscanf() reads the whole block up until the blank into string_of_letters. How is this possible?

What is the definition of sscanf()?

For example: str = "rony is a man" but at string_of_letters position i I see "rony".

char **string_of_letters;
int i;
char *read = str;

string_of_letters = (char**)malloc(3 * sizeof(char*));
for (i = 0; i < 3; i++) {
    string_of_letters[i] = (char*)malloc(sizeof(char));
    sscanf(read,"%[^, ]", &(*string_of_letters[i]));
    printf("%s\n", string_of_letters[i]);
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
NyaSol
  • 537
  • 1
  • 5
  • 21
  • 2
    Without reading the details of your question too much, if everything you say is true, you allocated enough space for 1 character, but read a complete string into it (more than 1 character), you most likely just overwrote the memory that followed that single character. This *may* happen to work because you cannot really allocate just 1 character in todays memory architecture, but may also have adverse effects. – Lasse V. Karlsen Aug 26 '18 at 18:46
  • 1
    https://en.wikipedia.org/wiki/Undefined_behavior – NPE Aug 26 '18 at 18:47
  • 2
    I like to read definition of functions at [POSIX site](http://pubs.opengroup.org/onlinepubs/9699919799/). Read about [`sscanf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html) there. – pmg Aug 26 '18 at 18:49
  • 2
    Just as side note `&(*string_of_letters[i])` is the same as simpler `string_of_letters[i]` – robal Aug 26 '18 at 19:08
  • You need to allocate at least 2 characters inside the loop (because `%[…]` scan sets create a null-terminated string), but then you could use `%1[^, ]` as the conversion to get one character at a time. Note that you need to test the return value of `sscanf()` to check that you got what you expected. You also need to increment `read` so as not to read the same character over and over. In more general cases, you'd use `%n` to be told where the scan stopped (see [Using `sscanf()` in a loop](https://stackoverflow.com/questions/3975236)). Scan sets do not skip white space (nor does `%c` or `%n`). – Jonathan Leffler Aug 26 '18 at 21:06
  • If you simply need to read characters from the string, using `sscanf()` is overkill. `*string_of_letters[i] = read[i];` would do the job succinctly and safely, though you use `printf("%c\n", *string_of_letters[i]);` to print a single character. – Jonathan Leffler Aug 26 '18 at 21:09

2 Answers2

6

C does not impose runtime memory bounds checking, so the fact that you only allocated one byte is of no consequence to the function of sscanf: it will happily attempt to store the entire string to the memory location pointed to by the pointer you provide. If the buffer is not big enough though, the result is undefined behavior, the exact consequences of which depend on too many factors to consider (the compiler used and its version, the operating system, the current state of memory, etc.).

In a small toy program such as yours, it is not surprising that it appears to work properly, as the buffers are small enough and there is not much else going on. In a larger program, however, it is likely that sscanf would write over the end of the buffer passed in and into another buffer, allocated for something else, altering memory you did not want to change, or, if you're lucky for example, into protected memory, causing an access violation.

mnistic
  • 10,866
  • 2
  • 19
  • 33
4

There are many ways to fix up the code fragment shown. This code shows three of them. As noted in comments to the question, you need to allocate at least 2 characters inside the loop (because %[…] scan sets create a null-terminated string), but then you could use %1[^, ] as the conversion to get one character at a time. Note that you need to test the return value of sscanf() to check that you got what you expected. You also need to increment read so as not to read the same character over and over. In more general cases, you'd use %n to be told where the scan stopped (see Using sscanf() in a loop). Scan sets do not skip white space (nor do %c or %n — all other standard conversions do skip leading white space, including newlines).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { LIST_SIZE = 3 };

static void free_array(size_t n, char **arr)
{
    for (size_t i = 0; i < n; i++)
        free(arr[i]);
    free(arr);
}

int main(void)
{
    char str[] = "rony is a man";
    char **string_of_letters;
    char *read = str;

    printf("Variant 1:\n");
    string_of_letters = (char **)malloc(LIST_SIZE * sizeof(char *));
    for (int i = 0; i < LIST_SIZE; i++)
    {
        string_of_letters[i] = (char *)malloc(2 * sizeof(char));
        if (sscanf(&read[i], "%1[^, ]", string_of_letters[i]) != 1)
            printf("Conversion failed on %d\n", i);
        else
            printf("%s\n", string_of_letters[i]);
    }

    free_array(LIST_SIZE, string_of_letters);

    printf("Variant 2:\n");
    string_of_letters = (char **)malloc(LIST_SIZE * sizeof(char *));
    for (int i = 0; i < LIST_SIZE; i++)
    {
        string_of_letters[i] = (char *)malloc(sizeof(char));
        *string_of_letters[i] = read[i];
        printf("%c\n", *string_of_letters[i]);
    }

    free_array(LIST_SIZE, string_of_letters);

    printf("Variant 3:\n");
    strcpy(str, "  r o  n");

    char char_list[LIST_SIZE + 1];      // NB: + 1 provides space for null byte
    int offset = 0;
    for (int i = 0; i < LIST_SIZE; i++)
    {
        int pos;
        printf("Offset = %d: ", offset);
        if (sscanf(&read[offset], " %1[^, ]%n", &char_list[i], &pos) != 1)
        {
            printf("Conversion failed on character index %d\n", i);
            break;
        }
        else
            printf("%c\n", char_list[i]);
        offset += pos;
    }

    return 0;
}

The code shown runs cleanly under Valgrind on a Mac running macOS 10.13.6 High Sierra with Valgrind 3.14.0.GIT (a version extracted from Git, rather than a formally released set of source code).

Output:

Variant 1:
r
o
n
Variant 2:
r
o
n
Variant 3:
Offset = 0: r
Offset = 3: o
Offset = 5: n

As already observed, the code in the question sorta works, more by accident than design. The pointer returned by malloc() is constrained so that it points to a memory location that could be used for any purpose:

C11 §7.22.3 Memory management functions

¶1 … The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (…). …

That means that successive allocations of a single char will not be contiguous because of the alignment requirements of other types. Typically, you'll find that the minimum space allocated is 8 or 16 bytes (on 32-bit or 64-bit platforms), but that's by no means required. This does mean that there is often more space allocated than you requested (especially if you request a single byte). However, access to that extra space leads to undefined behaviour. Your run of your example code shows that sometimes 'undefined behaviour' behaves more or less as expected.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278