1

I am going to simplify my situation in order to focus on the actual problem.

Let's say I am writing a cover function for printf called print_data. The user calls print_data and passes in a single format string e.g. "%.1f" along with a void * representing data:

void print_data(const char *format, void *data);

My job is to take these arguments and somehow pass them on to printf.

My problem is that printf expects a value and not a pointer (except for strings). I have no way of determining the type of data that the user passed in, except for manually reading the format string myself and casting the data accordingly (e.g. if "f" was passed, cast to float).

A "magic" solution would be the ability to somehow de-reference the void*, but this of course is not possible.

I unfortunately cannot restructure the design as the problem isn't this simple and requires that I receive a void* and a format string.

My question is almost the same as printf by given pointer and format string. Issue with floats, except it looked like the solution was not resolved.

Any thoughts on how I could accomplish this?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Gary Allen
  • 1,218
  • 1
  • 13
  • 28
  • 3
    You will have to parse the format string to determine what type it expects and then cast and dereference the pointer appropriately. – John Bode Jul 14 '20 at 10:47
  • I mentioned that I could do this, but this solution for me is a bit long-winded. I understand it may be the only way - but ideally I am looking for a "magic" solution ;) – Gary Allen Jul 14 '20 at 10:49
  • 4
    All the code in [that question](https://stackoverflow.com/questions/7352869/printf-by-given-pointer-and-format-string-issue-with-floats) is just wrong. `I am looking for a "magic" solution` Do you have `_Generic`? Do you have access to the type of `data` on the caller side? Why would you design such an interface in the first place instead of using `v*printf`? There is no magic, and even if you do, it's going to be harder to maintain then just writing a parser. – KamilCuk Jul 14 '20 at 10:50
  • @KamilCuk very much so... – Gary Allen Jul 14 '20 at 10:50
  • 2
    @GaryAllen: No magic in this case. C just isn’t that high-level. You’ll have to do it the hard way. – John Bode Jul 14 '20 at 10:51
  • You are presumably aware that `void*` parameter simply contains an *address* in memory. `printf` works by casting the parameter according to the format string, which is why it will fail at runtime if you get the formatting wrong. Also, modern compilers tend to analyze the format string at compile time and warn you if you screw up, which [can even work with your own printf-style functions](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-flatten-function-attribute), if you use the same formatting specifiers as `printf`. – vgru Jul 14 '20 at 11:47
  • What does `data` point to for `"%s"` in your function? Does it point to the first character of the string, or does it point to a `char *`? (Similarly, for `"%p"`, is `data` the actual value to be printed or does it point to a `void *`?) – Ian Abbott Jul 14 '20 at 12:24
  • Presumably, you cannot simply do `#define print_data printf`? Or define `print_data` as a [variadic function which will call `vprintf`](https://godbolt.org/z/71Y8of)? – vgru Jul 14 '20 at 12:25
  • Please revisit the condition that you must receive a `void*` before doing anything else. Because, if there is any way of avoiding the `void*`, then `vprintf()` is the solution to use. With the correct `__attribute__(())` incantations, you'll be able to create a variadic `print_data()` that actually checks the datatypes of its arguments against the format string at the calling site. – cmaster - reinstate monica Jul 14 '20 at 12:28
  • @Groo Yes I am aware. As mentioned, I have simplified the example. The data is stored in a structure, which contains a void* to my data. I then LATER on format this data (I actually use sprintf but that's not important). I want my user to make use of the structure, I want them to be able to pass in any primitive data, and I don't want them to have to specify the type themselves. I hope you see my dilemma – Gary Allen Jul 14 '20 at 12:39
  • @IanAbbott data points to the primitive data from inside a struct e.g. it points to an integer if the user has made it point to an integer (see my above answer). Lets ignore the case of a string for now – Gary Allen Jul 14 '20 at 12:41
  • @cmaster-reinstatemonica Read my reply to Groo please. I'm not sure there is a way around storing the data as a void* inside my struct, but I would be open to suggestions. Also, __attribute__ is not MSVC – Gary Allen Jul 14 '20 at 12:42
  • Yes, the need to store this stuff is indeed a dilemma. Is it possible to format the string when the data is stored within the `struct`? In that case, I would go for a `vsnprintf()` wrapper (ensuring sufficient allocation of memory in a second call on failure). If the data needs to be set only after the format string is fixed, you may try to encapsulate the modification of the data and possibly regenerate the formatted string. But yes, I feel your pain... – cmaster - reinstate monica Jul 14 '20 at 12:56
  • As to the availability of `__attribute__(())`, does MSVC offer any other means to declare that a function behaves similarly to `printf()`? I don't know anything about that compiler, but I would hope that it offers *something* in that direction. I would at least try to google for it. – cmaster - reinstate monica Jul 14 '20 at 12:59
  • Referenced code uses `pprint ("float: %f\n",&fval); pprint ("double: %f\n",&dval);`. Are you _alwsy_ going to use `"%lf"` with `double`? – chux - Reinstate Monica Jul 14 '20 at 16:39
  • @chux-ReinstateMonica Yes. I just ended up solving this and have specified in my docs that users need to use "lf" – Gary Allen Jul 14 '20 at 16:55

4 Answers4

4

You must parse the format string in your function and call printf with the appropriate value type. To read the value, you can cast the void pointer to the appropriate type as determined by the conversion specifier.

Here is a quick example:

#include <inttypes.h>
#include <stddef.h>
#include <stdio.h>
#include <string.h>

#define printf printf__
int printf(const char *, ...);

int print_data(const char *format, void *data) {
    const char *p = format;
    enum {
        FMT_none = 0,
        FMT_c   = 1,
        FMT_i   = 2,
        FMT_u   = 3,
        FMT_f   = 4,
        FMT_pc  = 5,
        FMT_pv  = 6,
        PREF_l  = (1 << 3),
        PREF_ll = (1 << 4),
        PREF_h  = (1 << 5),
        PREF_hh = (1 << 6),
        PREF_j  = (1 << 7),
        PREF_z  = (1 << 8),
        PREF_t  = (1 << 9),
        PREF_L  = (1 << 10),
    };
    int fmt = FMT_none;
    
    for (;;) {
        int cur_fmt = FMT_none;
        int prefix = 0;
        p = strchr(p, '%');
        if (!p)
            break;
        p++;  // skip the '%'
        // skip the flag characters, width and precision
        // note that invalid combinations will not be detected
        // such as %..d or %.+d
        p += strspn(p, " -#+0123456789.");
        // parse the length modifier if present
        switch (*p) {
        case 'l':
            p++;
            prefix = PREF_l;
            if (*p == 'l') {
                p++;
                prefix = PREF_ll;
            }
            break;
        case 'h':
            p++;
            prefix = PREF_h;
            if (*p == 'h') {
                p++;
                prefix = PREF_hh;
            }
            break;
        case 'j':
            p++;
            prefix = PREF_j;
            break;
        case 'z':
            p++;
            prefix = PREF_z;
            break;
        case 't':
            p++;
            prefix = PREF_t;
            break;
        case 'L':
            p++;
            prefix = PREF_L;
            break;
        }
        switch (*p++) {
        case '%':
            if (p[-2] != '%')
                return -1;
            continue;
        case 'c':
            cur_fmt = FMT_c;
            break;
        case 'd':
        case 'i':
            cur_fmt = FMT_i;
            break;
        case 'o':
        case 'u':
        case 'x': case 'X':
            cur_fmt = FMT_u;
            break;
        case 'a': case 'A':
        case 'e': case 'E':
        case 'f': case 'F':
        case 'g': case 'G':
            cur_fmt = FMT_f;
            break;
        case 's':
            cur_fmt = FMT_pc;
            break;
        case 'p':
            cur_fmt = FMT_pv;
            break;
        default:
            return -1;
        }
        if (fmt != FMT_none)
            return -1; // more than one format
        fmt = cur_fmt | prefix;
    }
    switch (fmt) {
    case FMT_none:
        return printf(format);
    case FMT_c:
        return printf(format, *(char *)data);
    case FMT_c | PREF_l:
        // the (wint_t) cast is redundant, omitted
        return printf(format, *(wchar_t *)data);
    case FMT_i:
        return printf(format, *(int *)data);
    case FMT_i | PREF_l:
        return printf(format, *(long *)data);
    case FMT_i | PREF_ll:
        return printf(format, *(long long *)data);
    case FMT_i | PREF_h:
        return printf(format, *(short *)data);
    case FMT_i | PREF_hh:
        return printf(format, *(signed char *)data);
    case FMT_i | PREF_j:
        return printf(format, *(intmax_t *)data);
    case FMT_i | PREF_z:
    case FMT_u | PREF_z:
        return printf(format, *(size_t *)data);
    case FMT_i | PREF_t:
    case FMT_u | PREF_t:
        return printf(format, *(ptrdiff_t *)data);
    case FMT_u:
        return printf(format, *(unsigned *)data);
    case FMT_u | PREF_l:
        return printf(format, *(unsigned long *)data);
    case FMT_u | PREF_ll:
        return printf(format, *(unsigned long long *)data);
    case FMT_u | PREF_h:
        return printf(format, *(unsigned short *)data);
    case FMT_u | PREF_hh:
        return printf(format, *(unsigned char *)data);
    case FMT_u | PREF_j:
        return printf(format, *(uintmax_t *)data);
    case FMT_f:
        // the cast (double) is redundant, but useful to prevent warnings
        return printf(format, (double)*(float *)data);
    case FMT_f | PREF_l:
        return printf(format, *(double *)data);
    case FMT_f | PREF_L:
        return printf(format, *(long double *)data);
    case FMT_pc:
        return printf(format, *(char **)data);
    case FMT_pc | PREF_l:
        return printf(format, *(wchar_t **)data);
    case FMT_pv:
        return printf(format, *(void **)data);
    default:
        return -1;
    }
}

Notes:

  • the floating point formats behave like scanf(): use %f if data points to a float and %lf if it points to a double. The l will be ignored by printf as float values are converted to double when passed to vararg functions.

  • this function expects a pointer to char for a %c format although printf expects an int that will be converted to unsigned char.

  • this function expects a pointer to wchar_t for a %lc format although printf expects a wint_t.

  • conversion specifiers %zd and %tu allowed by the C Standard but the corresponding types are not defined by the standard. Passing the type with the other signedness is not strictly correct for negative values but unlikely to pose a problem.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    The problem with using a `union` type like that is that it is likely to have larger alignment size than some of the objects the original `void *` can point to. So rather than using `pval`, it would be better to cast and dereference `data` for each case, e.g. : `case FMT_i:` `return printf(format, *(int *)data);`. – Ian Abbott Jul 14 '20 at 12:04
  • For the `FMT_f` case, the extra cast to `(double)` is superfluous since the `float` value will be converted to `double` by the _default argument promotions_, but perhaps you have added the cast for readability. – Ian Abbott Jul 14 '20 at 12:27
  • Thanks for the help. I am looking into this option now - unfortunately its not what I wanted to do as mentioned, but it seems like its my only choice – Gary Allen Jul 14 '20 at 12:43
  • After finding a `%`, you should check for `%%` before skipping anything else. – Ian Abbott Jul 14 '20 at 12:50
  • `%hhc` is illegal according to the C specification, and `%lc` expects a `wint_t` rather than a `long int`. So you probably need cases for `FMT_c` and `FMT_c | PREF_l`. Then the question is what pointer type should `data` be converted to for `FMT_c` before it is deferenced? I suppose the most natural is `char *` but it is a bit of a compromise since `printf`'s `%c` expects an `int` value and converts it to `unsigned char`. – Ian Abbott Jul 14 '20 at 14:13
  • I agree, no need to expect a pointer to `wint_t` for anything since that is only the type that `wchar_t` is converted to by the default argument promotions. – Ian Abbott Jul 14 '20 at 14:29
  • 1
    You could deal with those shortcomings by adding `PREF_j`, `PREF_t` and `PREF_z`, increasing the number of `case` labels to be handled, but eliminating all the stuff involving the `size` variable. – Ian Abbott Jul 14 '20 at 14:43
  • @IanAbbott: yes, that's a simple solution. It might generate some more code but not necessarily less efficient. – chqrlie Jul 14 '20 at 14:49
  • 1
    Concerning `case FMT_i | PREF_z:`, you may find [How to use “zd” specifier with `printf()`?](https://stackoverflow.com/q/32916575/2410359) interesting. – chux - Reinstate Monica Jul 14 '20 at 16:24
  • 1
    @chux-ReinstateMonica: the `union` was in reference to the initial code posted in my answer, but it was not completely correct. `%zd` is indeed interesting. `FMT_pc: | PREF_l` is unsupported but easy to add. – chqrlie Jul 14 '20 at 22:54
2

Any thoughts on how I could accomplish this?

Parse the format - and there are many possibilities given that the format not only has a specifier "acdefginopsuxAEFGX%" but modifiers "hlhhlljztL". This rapidly makes for dozens if not 100+ valid raw combinations.

Simplification exists: specifiers "aefgAEFG" all are double or wider. "uoxX" are unsigned or wider. I think then the number of valid specifier/modifier combinations comes down to (2(fp) +8(i) +8(u) +2(c) +2(s) +1(1)).

Specifier "%n" may need to be disallowed here - does not make much sense.

An example of format parsing: How to check that two format strings are compatible?

Unclear what OP wants to do with width and precision as in "%*.*Lf" which needs more than 1 argument.


An alternative to writing a cover function for printf by passing a format and void * would print with no format, Just the object to a macro and use _Generic to steer the print function. Explored in Formatted print without the need to specify type matching specifiers using _Generic.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

As the answers to linked question point out, you will have to write all the different casts. In your case it's doable since there's just a single void*, so the set of casts is finite.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • I really would like to avoid doing this :( but I understand if I have to – Gary Allen Jul 14 '20 at 10:50
  • No real alternative. You need one call to `printf(const char*, int)`, one call to `printf(const char*, float)`, etcetera. Those calls will contain a `*(int*)data` or `*(float*)data` cast respectively. The compiler will then figure out where the argument goes, which is ABI-dependent. – MSalters Jul 14 '20 at 10:56
0

I ended up making use of a dependency-free printf library available at https://github.com/mpaland/printf.

Once downloaded, it was basically as simple as changing all instances of va_arg which accepted a type such as "int" to instead accept a pointer of that type, and then dereferencing that value.

For example,

... = (int)va_arg(va, int)

was changed to

... = (int)*va_arg(va, int*)

The only other code I had to change was regarding doubles and floats, where I had to specifically check if 'lf' or 'f' had been passed. Luckily, because the library was well written and so easy to understand, I noticed that a flag had already been set for me (i.e. FLAGS_LONG) in the case of 'lf'.

I then simply had to check if that flag was set and if so I interpreted the value as a double, else I interpreted it as a float.

Hope this helps anywhere trying to implement something similar.

Gary Allen
  • 1,218
  • 1
  • 13
  • 28