malloc()
et al. are required by the C standard to provide sufficiently aligned pointers for any data type, so to reduce the allocation overheads, you need to implement your own allocator (or use some existing one).
One possibility would be to have a pool chain for each possible allocation size. Within each pool, you can use a bit map to track which items are allocated and which free. The overhead is just over one bit per item, but you may end up having lots of pool chains; this tends to make free()
slow, because it has to search for the correct pool.
A better possibility, I think, is to create a pool chain for small allocations. Within each pool, the small chunks form a linked list, with a single unsigned char
keeping track of the length and allocation status. This yields unaligned pointers, but at a overhead of just over one char.
For example:
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
#include <stdio.h>
#define SMALL_POOL_SIZE 131072
#define SMALL_LIMIT ((UCHAR_MAX + 1U) / 2U)
struct small_pool {
struct small_pool *next;
unsigned int size;
unsigned int used;
unsigned char data[];
};
static struct small_pool *list = NULL;
Within the data[]
member, the first char is 1 to SMALL_LIMIT-1 for a free chunk of that size, or SMALL_LIMIT+1 or larger for an used chunk of 1 char or more. Zero and SMALL_LIMIT indicate errors. A space-aggressive allocator could be implemented for example as
void *small_alloc(const size_t size)
{
struct small_pool *pool;
if (size < 1 || size >= SMALL_LIMIT) {
errno = EINVAL;
return NULL;
}
pool = list;
while (pool != NULL) {
/* Unused space in the pool? */
if (pool->used + size < pool->size) {
unsigned char *const ptr = pool->data + pool->used;
/* Grab it. */
pool->used += size + 1U;
/* Create new slot. */
(*ptr) = size + SMALL_LIMIT;
/* Done. */
return ptr + 1U;
}
/* Check the free slots in the pool. */
{
unsigned char *const end = pool->data + pool->used;
unsigned char *ptr = pool->data;
unsigned char big_len = SMALL_LIMIT;
unsigned char *big_ptr = NULL;
while (ptr < end)
if (*ptr == 0U || *ptr == SMALL_LIMIT) {
/* Invalid pointer */
errno = EDOM;
return NULL;
} else
if (*ptr > SMALL_LIMIT) {
/* Used slot, skip */
ptr += (*ptr) - SMALL_LIMIT + 1U;
continue;
} else {
if (*ptr < size) {
/* Slot is too small, skip it */
ptr += (*ptr) + 1U;
continue;
} else
if (*ptr == size) {
/* Perfect slot; grab it. */
(*ptr) = size + SMALL_LIMIT;
return ptr + 1U;
} else
/* Remember smallest of the large enough slots */
if (*ptr < big_len) {
big_len = *ptr;
big_ptr = ptr;
}
ptr += (*ptr) + 1U;
}
if (big_ptr != NULL) {
(*big_ptr) = big_len + SMALL_LIMIT;
return big_ptr + 1;
}
}
/* Check the next pool. */
pool = pool->next;
}
/* Need a new pool. */
pool = malloc(SMALL_POOL_SIZE);
if (pool == NULL) {
errno = ENOMEM;
return NULL;
}
/* Initialize pool; use the initial slot for the new allocation. */
pool->used = size + 1;
pool->size = SMALL_POOL_SIZE - sizeof (struct small_pool);
pool->data[0] = size + SMALL_LIMIT;
/* Prepend this pool to the pool chain. */
pool->next = list;
list = pool;
/* Return the slot we used. */
return pool->data + 1;
}
It has a simple strategy: If there is unused trailing space in the pool, use it. Otherwise, scan through the pool to find the unused slots. If there is a perfectly-sized slot, use it; otherwise, use the smallest sufficiently large unused slot.
There are many improvements possible. For example, you could keep full pools in a separate list, to avoid scanning through them. It might also be a good idea to move the pool where a free slot was found in, to the start of the pool list.
The deallocation is more complicated. If we have relatively few deallocations amidst allocations, and don't worry about freeing entire pools, the deallocation could be as simple as
int small_free(void *const item)
{
if (item == NULL)
return 0;
else {
struct small_pool *pool = list;
while (pool != NULL && !((unsigned char *)item > pool->data && (unsigned char *)item < pool->data + pool->used))
pool = pool->next;
if (pool != NULL) {
unsigned char *const ptr = (unsigned char *)item - 1;
if (*ptr > SMALL_LIMIT)
(*ptr) -= SMALL_LIMIT;
return 0;
}
return ENOENT;
}
}
assuming you need the function to return ENOENT
in case the allocation is not actually a small one. If it is important to verify the pointer to be deallocated is valid, say for debugging, then
int small_free(void *const item)
{
if (item == NULL)
return 0;
else {
struct small_pool *pool = list;
while (pool != NULL && !((unsigned char *)item > pool->data && (unsigned char *)item < pool->data + pool->used))
pool = pool->next;
if (pool != NULL) {
unsigned char *const end = pool->data + pool->used;
unsigned char *ptr = pool->data;
while (ptr < end)
if (*ptr == 0U || *ptr == SMALL_LIMIT)
return EDOM;
else
if (*ptr < SMALL_LIMIT) {
size_t len = (*ptr) + 1U;
/* Coalesce consecutive slots, if possible. */
while (len + ptr[len] < SMALL_LIMIT) {
(*ptr) = len + ptr[len];
len = (*ptr) + 1U;
}
ptr += len;
} else {
const size_t len = (*ptr) + 1U - SMALL_LIMIT;
/* Within the current slot.. probably should just
* compare item to ptr+1 instead. */
if ((unsigned char *)item > ptr && (unsigned char *)item < ptr + len) {
*ptr = len - 1U;
return 0;
}
ptr += len;
}
}
return ENOENT;
}
}
Even this latter version does not trim ->used
when the last chunk(s) in a pool are freed, nor does it free completely unused pools either. In other words, the above deallocation is only a rough example.
Speed-wise, the above seem to be at least an order of magnitude slower than GLIBC malloc()
/free()
on my machine. Here is a simple test to check for linear allocation - semirandom deallocation pattern:
/* Make sure this is prime wrt. 727 */
#define POINTERS 1000000
int main(void)
{
void **array;
size_t i;
fprintf(stderr, "Allocating an array of %lu pointers: ", (unsigned long)POINTERS);
fflush(stderr);
array = malloc((size_t)POINTERS * sizeof array[0]);
if (array == NULL) {
fprintf(stderr, "Failed.\n");
return EXIT_FAILURE;
}
fprintf(stderr, "Done.\n\n");
fprintf(stderr, "Allocating pointers in varying sizes.. ");
fflush(stderr);
for (i = 0; i < POINTERS; i++) {
const size_t n = 1 + ((i * 727) % (SMALL_LIMIT - 1));
if (!(array[i] = small_alloc(n))) {
if (errno == EDOM)
fprintf(stderr, "Failed at %lu; corrupted list.\n", (unsigned long)i + 1UL);
else
fprintf(stderr, "Failed at %lu: %s.\n", (unsigned long)i + 1UL, strerror(errno));
return EXIT_FAILURE;
}
}
fprintf(stderr, "Done.\n\n");
fprintf(stderr, "Deallocating pointers in a mixed order.. ");
fflush(stderr);
for (i = 0; i < POINTERS; i++) {
const size_t p = (i * 727) % POINTERS;
if (small_free(array[p])) {
if (errno == EDOM)
fprintf(stderr, "Failed at %lu: corrupted list.\n", (unsigned long)i + 1UL);
else
fprintf(stderr, "Failed at %lu: %s.\n", (unsigned long)i + 1UL, strerror(errno));
return EXIT_FAILURE;
}
}
fprintf(stderr, "Done.\n\n");
fprintf(stderr, "Deallocating the pointer array.. ");
fflush(stderr);
free(array);
fprintf(stderr, "Done.\n\n");
fflush(stderr);
return EXIT_SUCCESS;
}
The true strength of a pool-based allocator, in my opinion, is that you can release entire pools at once. With that in mind, perhaps your workload could use a specialized allocator when building your structures, with a compactor phase (capable of adjusting pointers) run at least at the end, and possibly during the construction too if a sufficient number of deletions have been done. That approach would allow you to release the temporarily needed memory back to the OS. (Without compaction, most pools are likely to have at least one allocation left, making it impossible to release it.)
I don't think this is a good answer at all, but a better answer would require more details, specifically about the data structures stored, and the allocation/deallocation pattern that occurs in practice. In the absence of those, I hope this gives at least some ideas on how to proceed.