Improve performance of my C list push_back function

Question

I have my C list, and i implemented the push_back function:

bool_t push_back_clist(clist_ptr pList, void* item)
{
    if(pList)
    {
        node_ptr pNode = new_node(item, pList->sizeof_item);
        if(!pNode) return FALSE;

        if(!pList->head)
            pList->head = pList->tail = pNode;
        else
        {
            pList->tail->next = pNode;
            pNode->prev = pList->tail;
            pList->tail = pNode;
        }

        pList->size++;
        return TRUE;
    }

    return FALSE;
}

static node_ptr new_node(void* data, size_t sizeof_item)
{
    node_ptr pNode = (node_ptr) malloc(sizeof(node_t));
    if(!pNode) return NULL;

     pNode->data = malloc(sizeof_item);

     if(!pNode->data)
     {
         free(pNode);
         return NULL;
     }

     memcpy(pNode->data, data, sizeof_item);
     pNode->next = pNode->prev = NULL;
     return pNode;
}

It works, but when I compared my push_back_clist function with std::list.push_back method I noticed that my function requires about twice as long. Why? How could i improve the performance of my function? Thanks.

score 5 · Accepted Answer · answered Jun 02 '12 at 23:57

5

You can allocate your data and node in a single go to save on the number of malloc calls.

char* mem = malloc(sizeof(node_t)+sizeof_item);
// Check alloc here...
node_ptr pNode = (node_ptr)mem;
pNode->data = mem+sizeof(node_t);

answered Jun 02 '12 at 23:57

Sergey Kalinichenko

714,442
84
1,110
1,523

This can introduce alignment problems depending on platform. In C++ you get around this by making your list a template class and just have two members, one for the datum and one for your list pointers. I am not sure there is any portable way to do the same in C. – Nemo Jun 03 '12 at 00:04
@Nemo If you make sure `sizeof(node_t)` is a multiple of the alignment size then this should work. – Matt Jun 03 '12 at 00:06
@Nemo With two pointers in the `node_t` the alignment problems are possible, but somewhat unlikely: pointers are allocated on machine word boundaries, so with two pointers the data will be allocated at a word boundary as well. – Sergey Kalinichenko Jun 03 '12 at 00:08
passing through a `char*` is useless. Assigning directly to `pNode` and then `pNode->data = pNode + 1` would do (assuming that the `data` field is `void*`) – Jens Gustedt Jun 03 '12 at 06:57
1

for the problem of alignement, C11 has the new `max_align_t` so you could take care that the allocation is at least `sizeof(max_align_t) + sizeof_item` and then do `pNode->data = (max_align_t*)pNode + 1` – Jens Gustedt Jun 03 '12 at 07:06
I have to work with C89, with VC I can use the `#pragma pack` directive, but were it not so? – gliderkite Jun 03 '12 at 10:02
Anyway, it works. The double allocation was the function bottleneck. – gliderkite Jun 03 '12 at 15:52

Jens Gustedt · Answer 2 · 2012-06-03T12:11:14.373

I don't think that you should go with a single allocation as proposed by dasblinkenlight, since this is an implicit change of the interface, that in addition is difficult to document: items allocated like that could not be used to store different data and to delete the data that was found on them.

For a possible optimization, I think that in your version the control flow might be too complicated and inhibit some optimizations. Try to touch the newly allocated item only once by providing the prev and next field directly to the allocation function. Then have the control flow of that allocation function streamlined:

static node_ptr new_node(void* data, size_t sizeof_item, node_ptr prev, node_ptr next)
{
     void * cdata = malloc(sizeof_item);
     if(!cdata) return NULL;
     memcpy(cdata, data, sizeof_item);

     node_ptr pNode = malloc(sizeof *pNode);
     if(pNode)
       *pNode = (struct node){ .data = cdata, .prev = prev, .next = next, };
     return pNode;
}

That only works as is if you have a C99 compatible compiler. If you don't, reconsider to get one :) or change the use of a compound literal to a sequence of assignments.

some nitpicks:

C has its own Boolean type nowadays (since 13 years), bool, true and false should work well if you include stdbool.h.
don't cast the return of malloc
typedef's of pointer types are considered bad style by many in the C community

Edit: By playing a bit around with different version I found the following to produce the nicest assembler with my setting (x86_64, linux, gcc or clang)

node_ptr new_node1(void* data, size_t sizeof_item, node_ptr next, node_ptr prev)
{
  node_ptr ret = NULL;
  void * nData = malloc(sizeof_item);
  if (!nData) return ret;
  struct node const nNode = {
    /* memcpy is unavoidable since nothing is known about the type of
       the data. But we can save a register by using the return value
       of memcpy. */
    memcpy(nData, data, sizeof_item),
    next,
    prev
  };
  /* Allocate the return value last so it may stay in the same return
     register */
  ret = malloc(sizeof *ret);
  if (!ret) return ret;
  /* Assignment is better than memcpy since this can just use "ret" as
     target address with offsets. */
  *ret = nNode;
  return ret;
}

No offenses, but you demonstrate an excellent theoretical understanding of OOP, together with how much are you off the real practical problems related to the code optimizations. Heap allocations (as implemented by the standard heapin multi-threaded environment) are **so** much heavier than all other things done there (such as conditional branching and etc.), that they all are virtuall insignificant. This is the reason why this list implementation is (almost) exactly twice slower that the standard one. — valdo, Jun 03 '12 at 08:42
@valdo, I guess you mean the C++ standard one, C doesn't have one :) And if you read well what I say, I am not claiming that conditional branching by itself is taking the time, I say that it inhibits optimization. — Jens Gustedt, Jun 03 '12 at 09:15
@JensGustedt Thanks for your answer, I have to work with C89. Why typedef's of pointer types are considered bad style by many in the C community? — gliderkite, Jun 03 '12 at 10:10
I think because this is somehow obfuscating its nature and the gain over having a `typedef` for the `struct` and using a simple `*` is minimal. The way you have it with a clear indication in the name is probably ok for most. — Jens Gustedt, Jun 03 '12 at 11:14

score 0 · Answer 3 · answered Jun 03 '12 at 08:02

0

Like people said here, your list is twice slow because you perform 2 heap operations (allocations) per insert.

Doing a single allocation for node + data is preferable from the performance point of view. Also if your overall performance is significantly impacted by the heap, you may (and should) use an alternative heap. Doing so you may improve the performance by and order of magnitude, 100 times and more, and this is not an overestimation.

answered Jun 03 '12 at 08:02

valdo

12,632
2
37
67

How to use this `alternative heap`? – gliderkite Jun 03 '12 at 10:04
I think that you completely overestimate the possible gain of that. Mondern `malloc` systems are not too bad. As gliderkite has noted in the question he observes a factor of two compared to the c++ implementation, not a factor of hundrets. – Jens Gustedt Jun 03 '12 at 11:19
Factor of 2 exactly equals to the ratio of `malloc` (or similar) operations in the described scenario. So that it enforces the assumption that heap operations are the bottleneck. And I say: take an optimized heap and you'll get a factor of hundreds. – valdo Jun 03 '12 at 12:29
1

@valdo You should explain what you mean with `optimized/alternative heap`, because without examples or references it's only a word. – gliderkite Jun 03 '12 at 12:53
@gliderkite: I'm talking about hi-performance allocators, such as tcmalloc, ptmalloc, Hoard, and etc. I've also developed one myself some time ago (for Windows, but with some effort may probably be used on any platform). – valdo Jun 03 '12 at 13:33

Improve performance of my C list push_back function

3 Answers3