9

I'm implementing a set of common yet not so trivial (or error-prone) data structures for C (here) and just came with an idea that got me thinking.

The question in short is, what is the best way to implement two structures that use similar algorithms but have different interfaces, without having to copy-paste/rewrite the algorithm? By best, I mean most maintainable and debug-able.

I think it is obvious why you wouldn't want to have two copies of the same algorithm.

Motivation

Say you have a structure (call it map) with a set of associated functions (map_*()). Since the map needs to map anything to anything, we would normally implement it taking a void *key and void *data. However, think of a map of int to int. In this case, you would need to store all the keys and data in another array and give their addresses to the map, which is not so convenient.

Now imagine if there was a similar structure (call it mapc, c for "copies") that during initialization takes sizeof(your_key_type) and sizeof(your_data_type) and given void *key and void *data on insert, it would use memcpy to copy the keys and data in the map instead of just keeping the pointers. An example of usage:

int i;
mapc m;
mapc_init(&m, sizeof(int), sizeof(int));
for (i = 0; i < n; ++i)
{
    int j = rand();  /* whatever */
    mapc_insert(&m, &i, &j);
}

which is quite nice, because I don't need to keep another array of is and js.

My ideas

In the example above, map and mapc are very closely related. If you think about it, map and set structures and functions are also very similar. I have thought of the following ways to implement their algorithm only once and use it for all of them. Neither of them however are quite satisfying to me.

  1. Use macros. Write the function code in a header file, leaving the structure dependent stuff as macros. For each structure, define the proper macros and include the file:

    map_generic.h
    
    #define INSERT(x) x##_insert
    
    int INSERT(NAME)(NAME *m, PARAMS)
    {
        // create node
        ASSIGN_KEY_AND_DATA(node)
        // get m->root
        // add to tree starting from root
        // rebalance from node to root
        // etc
    }
    
    map.c
    
    #define NAME map
    #define PARAMS void *key, void *data
    #define ASSIGN_KEY_AND_DATA(node) \
    do {\
        node->key = key;\
        node->data = data;\
    } while (0)
    #include "map_generic.h"
    
    mapc.c
    
    #define NAME mapc
    #define PARAMS void *key, void *data
    #define ASSIGN_KEY_AND_DATA(node) \
    do {\
        memcpy(node->key, key, m->key_size);\
        memcpy(node->data, data, m->data_size);\
    } while (0)
    
    #include "map_generic.h"
    

    This method is not half bad, but it's not so elegant.

  2. Use function pointers. For each part that is dependent on the structure, pass a function pointer.

    map_generic.c
    
    int map_generic_insert(void *m, void *key, void *data,
        void (*assign_key_and_data)(void *, void *, void *, void *),
        void (*get_root)(void *))
    {
        // create node
        assign_key_and_data(m, node, key, data);
        root = get_root(m);
        // add to tree starting from root
        // rebalance from node to root
        // etc
    }
    
    map.c
    
    static void assign_key_and_data(void *m, void *node, void *key, void *data)
    {
        map_node *n = node;
        n->key = key;
        n->data = data;
    }
    
    static map_node *get_root(void *m)
    {
        return ((map *)m)->root;
    }
    
    int map_insert(map *m, void *key, void *data)
    {
        map_generic_insert(m, key, data, assign_key_and_data, get_root);
    }
    
    mapc.c
    
    static void assign_key_and_data(void *m, void *node, void *key, void *data)
    {
        map_node *n = node;
        map_c *mc = m;
        memcpy(n->key, key, mc->key_size);
        memcpy(n->data, data, mc->data_size);
    }
    
    static map_node *get_root(void *m)
    {
        return ((mapc *)m)->root;
    }
    
    int mapc_insert(mapc *m, void *key, void *data)
    {
        map_generic_insert(m, key, data, assign_key_and_data, get_root);
    }
    

    This method requires writing more functions that could have been avoided in the macro method (as you can see, the code here is longer) and doesn't allow optimizers to inline the functions (as they are not visible to map_generic.c file).

So, how would you go about implementing something like this?

Note: I wrote the code in the stack-overflow question form, so excuse me if there are minor errors.

Side question: Anyone has a better idea for a suffix that says "this structure copies the data instead of the pointer"? I use c that says "copies", but there could be a much better word for it in English that I don't know about.


Update:

I have come up with a third solution. In this solution, only one version of the map is written, the one that keeps a copy of data (mapc). This version would use memcpy to copy data. The other map is an interface to this, taking void *key and void *data pointers and sending &key and &data to mapc so that the address they contain would be copied (using memcpy).

This solution has the downside that a normal pointer assignment is done by memcpy, but it completely solves the issue otherwise and is very clean.

Alternatively, one can only implement the map and use an extra vectorc with mapc which first copies the data to vector and then gives the address to a map. This has the side effect that deletion from mapc would either be substantially slower, or leave garbage (or require other structures to reuse the garbage).


Update 2:

I came to the conclusion that careless users might use my library the way they write C++, copy after copy after copy. Therefore, I am abandoning this idea and accepting only pointers.

Marcin
  • 48,559
  • 18
  • 128
  • 201
Shahbaz
  • 46,337
  • 19
  • 116
  • 182

3 Answers3

3

You roughly covered both possible solutions.

The preprocessor macros roughly correspond to C++ templates and have the same advantages and disadvantages:

  • They are hard to read.
  • Complex macros are often hard to use (consider type safety of parameters etc.)
  • They are just "generators" of more code, so in the compiled output a lot of duplicity is still there.
  • On other side, they allow compiler to optimize a lot of stuff.

The function pointers roughly correspond to C++ polymorphism and they are IMHO cleaner and generally easier-to-use solution, but they bring some cost at runtime (for tight loops, few extra function calls can be expensive).

I generally prefer the function calls, unless the performance is really critical.

mity
  • 2,299
  • 17
  • 20
  • I was exactly thinking of templates when I was writing the first method, but I hadn't noticed that the second method is polymorphism! – Shahbaz Jun 14 '12 at 14:31
  • The polymorphism in C++ is based on virtual methods. Every class having virtual methods (directly or indirectly by inheritance) has vtable. And the vtable actually is (in C terms) just a structure with nothing but function pointers as its members. – mity Jun 14 '12 at 17:23
  • yeah I'm familiar with the internals of C++ ;) – Shahbaz Jun 14 '12 at 22:54
1

What you're looking for is polymorphism. C++, C# or other object oriented languages are more suitable to this task. Though many people have tried to implement polymorphic behavior in C.

The Code Project has some good articles/tutorials on the subject:

http://www.codeproject.com/Articles/10900/Polymorphism-in-C

http://www.codeproject.com/Articles/108830/Inheritance-and-Polymorphism-in-C

embedded.kyle
  • 10,976
  • 5
  • 37
  • 56
  • Polymorphism is similar to my second method as `mity` suggested, but doing it the way they did in those links is too C++ish. I think the problem can be handled more C-like than that. – Shahbaz Jun 14 '12 at 14:38
1

There's also a third option that you haven't considered: you can create an external script (written in another language) to generate your code from a series of templates. This is similar to the macro method, but you can use a language like Perl or Python to generate the code. Since these languages are more powerful than the C pre-processor, you can avoid some of the potential problems inherent in doing templates via macros. I have used this method in cases where I was tempted to use complex macros like in your example #1. In the end, it turned out to be less error-prone than using the C preprocessor. The downside is that between writing the generator script and updating the makefiles, it's a little more difficult to get set up initially (but IMO worth it in the end).

bta
  • 43,959
  • 6
  • 69
  • 99
  • This is quite interesting and indeed hadn't crossed my mind. Doesn't it get ugly with making the generated code have proper indentation, or taking care of `'\'`s etc? – Shahbaz Jun 14 '12 at 15:25
  • @Shahbaz- Making generated code match the rest of your code style-wise is usually not that important. Since the code is generated as part of the build process, it's not necessarily meant for human consumption (only the compiler). If you want to make it readable, there are tools like `astyle` or `uncrustify` that will automatically re-format a piece of code to match a desired style. As far as backslashes go, you can easily script the process of finding them in the input code and concatenating lines, etc. as appropriate. – bta Jun 14 '12 at 15:37
  • @Shahbaz- For debugging purposes yes, you'd want to use something like `astyle` to make your code more readable (I find it even helps on *human*-generated code). My point was that you would generally only do this on an as-needed basis. You wouldn't commit the generated code to your repository, mix generated code with non-generated code in the same file, or anything like that. – bta Jun 14 '12 at 16:20
  • Right. I would think about it. Probably in my case the macro solution is not too complicated, but your solution definitely scales better. In fact, the script could read the template from a file and generate the source files, instead of generating all of it itself, which makes it easier to write the template. – Shahbaz Jun 14 '12 at 16:25
  • I accepted your answer because it actually gave a new idea, even though I'm not going to use it. It will definitely be something I'd consider in the future if I encounter such a problem again – Shahbaz Jun 18 '12 at 08:37