4

Note: when I say "static string" here I mean memory that can not be handled by realloc.

Hi, I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc. As is, the procedure is a 'heavy' string processor, so being ignorant and duplicating the string whether or not it is static will surely cause some memory overhead/processing issues in the future.

I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.

I tried to use exception handlers to call realloc on a static variable... Glib reports that it can't find some private information to a structure (I'm sure) I don't know about and obviously calls abort on the program which means its not an exception that can be caught with longjmp/setjmp OR C++ try, catch finally.

I'm pretty sure there must be a way to do this reasonably. For instance dynamic memory most likely is not located anywhere near the static memory so if there is a way to divulge this information from the address... we might just have a bingo..

I'm not sure if there are any macros in the C/C++ Preprocessors that can identify the source and type of a macro argument, but it would be pretty dumb if it didn't. Macro Assemblers are pretty smart about things like that. Judging by the lack of robust error handling, I would not be a bit surprised if it did not.

  • 5
    "*I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.*" - Will, considering there is no notion of an exception that you can handle in C (i.e., no such thing as an exception handler) I think it's pretty reasonable. How would you expect to deal with modifying read only memory? – Ed S. Apr 11 '12 at 00:07
  • 4
    To be clear from the offset: there is no way to do this in a platform-independent manner. – Oliver Charlesworth Apr 11 '12 at 00:07
  • 3
    Is this a question or a rant? – geekosaur Apr 11 '12 at 00:08
  • Wow! Never got so many GREAT answers at once! –  Apr 11 '12 at 00:24
  • 1
    @TristonJ.Taylor: From your comments, I get the impression you're coding in C. Do you want answers for C or C++? – Mooing Duck Apr 11 '12 at 00:35
  • I'd like to use C but if a C++ solution can be had... My files are already in *.cpp :D –  Apr 11 '12 at 00:44
  • @EdS. back in the day I would have called the windows kernel using machine assembly to set the desired flags. and or might have written a list manager like the one described in my string compiler in my answer to this question. Which uses parameter counts to sanity check itself with api. –  Apr 11 '12 at 08:49

5 Answers5

5

C does not provide a portable way to tell statically allocated memory blocks from dynamically allocated ones. You can build your own struct with a string pointer and a flag indicating the type of memory occupied by the object. In C++ you can make it a class with two different constructors, one per memory type, to make your life easier.

As far as aborting your program goes, trying to free or re-allocate memory that has not been allocated dynamically is undefined behavior, so aborting is a fair game.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Thank you very much for your input! I agree with Oli: Best way to do it in C. –  Apr 11 '12 at 00:25
  • although I would forgo the struct and just use a typedef. No point in wasting memory when preprocessor type verification will do. –  Apr 11 '12 at 00:26
  • @TristonJ.Taylor: This cannot be solved with typedefs (at least not in C, where functions cannot be overloaded). – Oliver Charlesworth Apr 11 '12 at 00:31
  • It would certainly force the caller to cast an invalid pointer. That's somthing I can ward them off of in documention. Its a single function that should expect a type and return the same type. So wrapper functions for free, malloc, and realloc will need to be implemented. The cherry on top is a macro that can cast the type back to a char * –  Apr 11 '12 at 00:40
  • excluding the wrappers... all of that is essentially free. –  Apr 11 '12 at 00:43
4

You may be able to detect ranges of memory and do some pointer comparisons. I've done this in some garbage collection code, where I need to know whether a pointer is in the stack, heap, or elsewhere.

If you control all allocation, you can simply keep min and max bounds based on every dynamic pointer that ever came out of malloc, calloc or realloc. A pointer lower than min or greater than max is probably not in the heap, and this min and max delimited region is unlikely to intersect with any static area, ever. If you know that a pointer is either static or it came from malloc, and that pointer is outside of the "bounding box" of malloced storage, then it must be static.

There are some "museum" machines where that sort of stuff doesn't work and the C standard doesn't give a meaning to comparisons of pointers to different objects using the relational operators, other than exact equality or inequality.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • This is my favorite answer! I can tell you have mad-man experience with C. museum pieces! Classy! As much as I hate to admit it: The experts have confirmed! Documentation + Type Qualification is the only way to go here (today). –  Apr 11 '12 at 00:29
  • Thank you sir, you did provide an adequate answer to the question. –  Apr 11 '12 at 00:34
  • 1
    Another thing I have done recently in the interpreter for the TXR language is to implement several kinds of strings. These can all be used interchangeably, but the garbage collector knows not to step on the ones that are static. They are identified by a two-bit type code stuck right into the pointer. The macro `lit("foo bar")` takes the `wchar_t *` pointer produced by the string literal and changes the two least significant bits to the type code for a literal. Dynamic strings on the other hand, are pointers to a more complicated heap allocated structure. They can be used interchangeably. – Kaz Apr 11 '12 at 00:36
  • This approach is NOT VALID. Even if you assume pointer comparison between separate objects works, It's very possible for a string literal or other string with static storage duration to appear between two `malloc`-obtained strings. This will happen whenever the static string lies in a shared library and the two `malloc`-obtained strings happened to be allocated on opposite sides (e.g. via `mmap`) of the region where the shared library was loaded. OP's question is just fundamentally misguided, but presenting this broken and dangerous answer is not good even if it gets you some rep... – R.. GitHub STOP HELPING ICE Apr 11 '12 at 03:54
  • Indeed, the approach has to be fine tuned for shared lib use where you're mmaped smack in the middle of virtual memory. – Kaz Apr 11 '12 at 04:01
  • @R.. I'm obviously the resident F.N.G. but I understood what was stated quite clearly, which was "it can work in theory". I fully appreciate everyone's participation on the topic and many good ideas I have learned here will stick with me for a long time coming. What's probably most important is that this information will be available for others to contribute to and learn from. Thanks for your input! Its generally a pleasure to participate in a collaborative discussion on pressing topics of interest. –  Apr 11 '12 at 07:12
2

Any solution you would get would be platform specific, so you might want to specify the platform you are running on.

As for why a library should call abort when you pass it unexpected parameters, that tends to be safer than continuing execution. It's more annoying, certainly, but at that point the library knows that the code calling into it is in an state that cannot be recovered from.

MSN
  • 53,214
  • 7
  • 75
  • 105
  • I don't see how return null from being given an invalid pointer is a condition that can't be returned from. It makes sense if we were mangling the stack or doing some code injection, but not in a user accessible memory i/o 'cludge'. I suppose I should be thankful it fails noisily. Its one less error report that I'll have to generate. But it sucks that it makes 'app-go-boom' –  Apr 11 '12 at 00:37
  • Yup, return null, set error code EINVALID_MPTR, and you now have a way to detect static or dynamic memory FAITHFULLY. It seems like its some kind of 'code gone wild' security feature however. Only security people are that rude! –  Apr 11 '12 at 01:47
  • lol Sound familiar? "Were sorry, your call cannot be completed as dialed..." Crosslink the two subjects: "Your service will now be irrevocably terminated." –  Apr 11 '12 at 02:07
  • What happened to try your call again? That's just rude! –  Apr 11 '12 at 02:08
1

I have written a procedure that takes a char * argument and I would like to create a duplicate IF the memory is not relocatable/resizable via realloc.

Fundamentally, the problem is that you want to do memory management based on information that isn't available in the scope you're operating in. Obviously you know if the string is on the stack or heap when you create it, but that information is lost by the time you're inside your function. Trying to fix that is going to be nearly impossible and definitely outside of the Standard.

I have tried to use exception handlers to modify a static string, the application just exits without any notice. I step back, look at C and say: "I'm not impressed." That would be an exception if I have ever heard of one.

As already mentioned, C doesn't have exceptions. C++ could do this, but the C++ Standards Committee believes that having C functions behave differently in C++ would be a nightmare.

I'm pretty sure there must be a way to do this reasonably.

You could have your application replace the default stack with one you created (and, as such, know the range of addresses in) using ucontext.h or Windows Fibers, and check if the address is inside the that range. However, (1) this puts a huge burden on any application using your library (of course, if you wrote the only application using your library, then you may be willing to accept that burden); and (2) doesn't detect memory that can't be realloced for other reasons (allocated using static, allocated using a custom allocator, allocated using SysAlloc or HeapAlloc on Windows, allocated using new in C++, etc.).

Instead, I would recommend having your function take a function pointer that would point at a function used to reallocate the memory. If the function pointer is NULL, then you duplicate the memory. Otherwise, you call the function.

Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • 1
    I like your style. Thank you for contributing a great answer to my question. –  Apr 11 '12 at 01:05
0

original poster here. I neglected to mention that I have a working solution to the problem, it is not as robust as I would have hoped for. Please do not be upset, I appreciate everyone participating in this Request For Comments and Answers. The 'procedure' in question is variadic in nature and expects no more than 63 anonymous char * arguments.

What it is: a multiple string concatenator. It can handle many arguments but I advise the developer against passing more than 20 or so. The developer never calls the procedure directly. Instead a macro known as 'the procedure name' passes the arguments along with a trailing null pointer, so I know when I have met the end of statistics gathering.

If the function recieves only two arguments, I create a copy of the first argument and return that pointer. This is the string literal case. But really all it is doing is masking strdup

Failing the single valid argument test, we proceed to realloc and memcpy, using record info from a static database of 64 records containing each pointer and its strlen, each time adding the size of the memcopy to a secondary pointer (memcpy destination) that began as a copy of the return value from realloc.

I've written a second macro with an appendage of 'd' to indicate that the first argument is not dynamic, therefore a dynamic argument is required, and that macro uses the following code to inject a dynamic argument into the actual procedure call as the first argument:

strdup("")

It is a valid memory block that can be reallocated. Its strlen returns 0 so when the loop adds the size of it to the records, it affects nothing. The null terminator will be overwritten by memcpy. It works pretty damned well I should say. However being new to C in only the past few weeks, I didn't understand that you can't 'fool proof' this stuff. People follow directions or wind up in DLL hell I suppose.

The code works great without all of these extra shenanigans do-hickies and whistles, but without a way to reciprocate a single block of memory, the procedure is lost on loop processing, because of all the dynamic pointer mgmt. involved. Therefore the first argument must always be dynamic. I read somehwere someone had suggested using a c-static variable holding the pointer in the function, but then you can't use the procedure to do other things in other functions, such as would be needed in a recursive descent parser that decided to compile strings as it went along.

If you would like to see the code just ask!

Happy Coding!


mkstr.cpp
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>

struct mkstr_record {
    size_t size;
    void *location;
};

// use the mkstr macro (in mkstr.h) to call this procedure.
// The first argument to mkstr MUST BE dynamically allocated. i.e.: by malloc(),
// or strdup(), unless that argument is the sole argument to mkstr. Calling mkstr()
// with a single argument is functionally equivalent to calling strdup() on the same
// address.
char *mkstr_(char *source, ...) {

    va_list args;

    size_t length = 0, item = 0;

    mkstr_record list[64]; /*

    maximum of 64 input vectors. this goes beyond reason!

    the result of this procedure is a string that CAN be
    concatenated by THIS procedure, or further more reallocated!

    We could probably count the arguments and initialize properly,
    but this function shouldn't be used to concatenate more than 20
    vectors per call. Unless you are just "asking for it".

    In any case, develop a workaround. Thank yourself later.

    */// Argument Range Will Not Be Validated. Caller Beware!!!

    va_start(args, source);

    char *thisArg = source;

        while (thisArg) {

            // don't validate list bounds here.
            // an if statement here is too costly for
            // for the meager benefit it can provide.

            length += list[item].size = strlen(thisArg);
            list[item].location = thisArg;
            thisArg = va_arg(args, char *);
            item++;

        }

    va_end(args);

    if (item == 1) return strdup(source);   // single argument: fail-safe

    length++;   // final zero terminator index.

    char *str = (char *) realloc(source, length);

    if (!str) return str;   // don't care. memory error. check your work.

    thisArg = (str + list[0].size);

    size_t count = item;

    for (item = 1; item < count; item++) {
        memcpy(thisArg, list[item].location, list[item].size);
        thisArg += list[item].size;
    }

    *(thisArg) = '\0';  // terminate the string.

    return str;

}


mkstr.h
#ifndef MKSTR_H_
#define MKSTR_H_

extern char *mkstr_(char *string, ...);

// This macro ensures that the final argument to "mkstr" is null.
// arguments: const char *, ...
// limitation: 63 variable arguments max.
// stipulation: caller must free returned pointer.

#define mkstr(str, args...) mkstr_(str, ##args, NULL)
#define mkstrd(str, args...) mkstr_(strdup(str), ##args, NULL)

/* calling mkstr with more than 64 arguments should produce a segmentation fault
 * this is not a bug. it is intentional operation. The price of saving an in loop
 * error check comes at the cost of writing code that looks good and works great.
 *
 * If you need a babysitter, find a new function [period]
*/


#endif /* MKSTR_H_ */

Don't for get to mention me in the credits. She's fine and dandy.

  • In retrospect of experience gained. It might be better to provide a general string concatenator, that `strdup("")-realloc-memcpy` concatenates all proceeding arguments. And one that is designed specifically for in-loop processing with a positively identified reciprocal. I think this approach will serve to paint a clearer picture of what seems to me to be awfully ambiguous to the unbeknownst. –  Apr 11 '12 at 07:02
  • falling through with `counters` to produce `counts`, and using `address locations with counts` is pretty advanced coding in my book. but then again, I'm just now working with `C`. –  Apr 11 '12 at 10:12