3

After a quick scan of related questions on SO, I have deduced that there's no function that would check the amount of memory that malloc has allocated to a pointer. I'm trying to replicate some of std::string basic functionality (mainly dynamic size) using simple char*'s in C and don't want to call realloc all the time. I guess I'll need to keep track of how much memory has been allocated. In order to do that, I'm considering creating a typedef that will contain the string itself and an integer with the amount of memory currently allocated, something like this:

typedef struct {
    char * str;
    int mem;
} my_string_t;

Is that an optimal solution, or perhaps you can suggest something that will bear better results? Thanks in advance for your help.

mingos
  • 23,778
  • 12
  • 70
  • 107
  • 2
    1) Don't use an `int`. That's usually signed and you never want a string of length -1. Use `size_t`, the type intended to store array sizes. 2) Better yet, use a C string library. They'll usually be faster and easier to work with, and you'll end up writing one yourself anyway if you have to do extensive string work in C. – Chris Lutz Jan 08 '10 at 22:52
  • Thanks for the comment about size_t - I'll use that instead. As for string.h, I'll be using it intensively, but I'll need some extra functionality as well. Luckily, it's not for extensive work, just a few cases where I don't want to limit the memory size, just in case someone decides to create a 20kB string :D – mingos Jan 08 '10 at 22:58

6 Answers6

5

You will want to allocate the space for both the length and the string in the same block of memory. This may be what you intended with your struct, but you have reserved space for only a pointer to the string.

There must be space allocated to contain the characters of the string.

For example:

typedef struct
{
    int num_chars;
    char string[];
} my_string_t;

my_string_t * alloc_my_string(char *src)
{
    my_string_t * p = NULL;
    int N_chars = strlen(src) + 1;

    p = malloc( N_chars + sizeof(my_string_t));
    if (p)
    {
         p->num_chars = N_chars;
         strcpy(p->string, src);
    }
    return p;
}

In my example, to access the pointer to your string, you address the string member of the my_string_t:

my_string_t * p = alloc_my_string("hello free store.");
printf("String of %d bytes is '%s'\n", p->num_chars, p->string);

Be careful to realize that you are obtaining the pointer for the string as a consequence of allocating space to store the characters. The resource you are allocating is the storage for the characters, the pointer obtained is a reference to the allocated storage.

In my example, the memory allocated is laid out sequentially as follows:

+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
| 00 | 00 | 00 | 11 | 'h'| 'e'| 'l'| 'l'| 'o'| 20 | 'f'| 'r'| 'e'| 'e'| 20 | 's'| 't'| 'o'| 'r'| 'e'| '.'| 00 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
^^                   ^
||                   |
p|                   |
 p->num_chars        p->string

Notice that the value of p->string is not stored in the allocated memory, it is four bytes from the beginning of the allocated memory, immediately subsequent to the (presumed 32-bit, four-byte) integer.

Your compiler may require that you declare the flexible C array as:

typedef struct
{
    int num_chars;
    char string[0];
} my_string_t;

but the version lacking the zero is supposedly C99-compliant.

You can accomplish the equivalent thing with no array member as follows:

typedef struct
{
    int num_chars;
} mystr2;

char * str_of_mystr2(mystr2 * ms)
{
    return (char *)(ms + 1);
}

mystr2 * alloc_mystr2(char *src)
{
    mystr2* p = NULL;
    size_t N_chars = strlen(src) + 1;

    if (N_chars num_chars = (int)N_chars;
         strcpy(str_of_mystr2(p), src);
    }
    return p;
} 

printf("String of %d bytes is '%s'\n", p->num_chars, str_of_mystr2 (p));

In this second example, the value equivalent to p->string is calculated by str_of_mystr2(). It will have approximately the same value as the first example, depending on how the end of structs are packed by your compiler settings.

While some would suggest tracking the length in a size_t I would look up some old Dr. Dobb's article on why I disagree. Supporting values greater than INT_MAX is of doubtful value to your program's correctness. By using an int, you can write assert(p->num_chars >= 0); and have that test something. With an unsigned, you would write the equivalent test something like assert(p->num_chars < UINT_MAX / 2); As long as you write code which contains checks on run-time data, using a signed type can be useful.

On the other hand, if you are writing a library which handles strings in excess of UINT_MAX / 2 characters, I salute you.

Heath Hunnicutt
  • 18,667
  • 3
  • 39
  • 62
  • So, all the data contained within the struct should be within the same block of memory? Is that a necessity of some sort, or just a reasonable optimisation? – mingos Jan 08 '10 at 23:09
  • 1
    It would be acceptable to allocate the block containing the data in a separate step from the block containing the 'metadata' but I just want to be clear that in your example you would have to call malloc() twice -- once for your struct, and once for the data pointed to be the "str" member of your struct. – Heath Hunnicutt Jan 08 '10 at 23:14
  • OK, after some fun with the compiler, I found out that the flexible array member will not be an option... I'm trying to maintain my code strict ISO C (-pedantic-errors) and this just doesn't work :( – mingos Jan 09 '10 at 00:29
  • Use `size_t` for the type of the quantity variable. Quantities are not negative, so `int` will reduce your maximum capacity by 2. – Thomas Matthews Jan 09 '10 at 00:33
  • 1
    I updated my answer for your situation. I thought that using [] was C99-compliant, but rather than mess with it, you can use the direct approach shown above. – Heath Hunnicutt Jan 09 '10 at 01:09
  • 1
    Or for C89 compilers, `string[1]` would be fine, and then you either live with overallocating slightly or will have to use the `offsetof(my_string_t, string)` instead of `sizeof(my_string_t)`. – jamesdlin Jan 09 '10 at 03:35
2

This is the obvious solution. And while you are at it, you might want to have a struct member that maintains the amount of allocated memory actually in use. This will avoid having to call strlen() all the time, and would enable you to support non null-terminated strings, as the C++ std::string class does.

  • This sounds well on the way to writing one's own string library. Which I'm doing, incidentally, but I'm doing it because I sometimes enjoy writing low-level heavily-optimized code, and want it to go to a useful project. I wouldn't write a new string library for every application. – Chris Lutz Jan 08 '10 at 22:56
  • 1
    I assumed writing a string library was what the OP was asking about. –  Jan 08 '10 at 22:57
  • @Neil Butterworth: Hey, that's a useful idea! Thank you! @Chris Lutz: Yes, I'm writing a mini library, mainly for fun, but I've a specific project that might benefit from it at some point. – mingos Jan 08 '10 at 22:59
1

That is how it was done in the Pleistocene, and that's how you should do it today. You are dead on the money that malloc does not offer any portable, supported, mechanism to query the size of an allocated block.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
1

A more common way is to wrap malloc (and realloc) and keep a list of sizes and pointers
That way you don't need to change any string functions.

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
1

write wrapper functions. If you are using malloc then you should do that anyway.

For an example look in "writing solid code"

pm100
  • 48,078
  • 23
  • 82
  • 145
1

I think you could use malloc_usable_size.

3lectrologos
  • 9,469
  • 4
  • 39
  • 46
  • It isn't cited in the standards section. It seems to be a FreeBSD or Linux extension(possibly also on MacOS X), but a pretty deprecated one at best("You shouldn't use this for the only reason you might want to use this. Use it to debug, yeah..."). For reference, NetBSD doesn't appreciate it http://mail-index.netbsd.org/tech-kern/2007/07/20/0003.html. And it isn't on OpenBSD either. Windows has its own extension. –  Jan 09 '10 at 02:06