0

I am currently learning C, as someone, who has quite a bit of experience in more high-level languages than that.

After considering some of your comments and doing a bit more testing and fidelling, I came up with a better solution (scroll down).

Advice and comments are still very welcome though.

That said, I struggle a bit with dynamic array handling in C. I got it down to a point, where I quite like, simply terminating every array with NULL and using realloc where neccessary.

Basically I am in search of advice: Is there any downside, of doing it that way (see example code below), besides the obvious (can't use NULL values as data value) ?

#include "stdlib.h"

int array_length(void* array)
{
    int i;
    for (i = 0; ((void**)array)[i]; i++);
    return i;
}

void array_add(void* array, void* item)
{
    int size = array_length(array);
    realloc(array, sizeof(item) * (size + 2));

    ((void**)array)[size] = item;
    ((void**)array)[size + 1] = NULL;
}

void* array_new(){
    void** array = malloc(sizeof(NULL));
    array[0] = NULL;
    return array;
}

#include "stdio.h"

void print_items(char** array){
    printf("%i - [", array_length(array));
    for (int i = 0; array[i]; i++){
        printf("\"%s\"", array[i]);
        if (array[i+1])
            printf(", ");
    }
    printf("]\n");
}

int main(){
    char** str_list = array_new();

    print_items(str_list);
    array_add(str_list, "Something!");
    print_items(str_list);
    array_add(str_list, "Another Thing.");
    print_items(str_list);
}

Better Version (Update!)

After reading through your answers and debugging a bit more myself, I came up with a new version, which is mainly based on the struct approach mentioned below.

Have a Look, and tell me, what I am doing wrong now. :)

#include "stdlib.h"
#include "assert.h"

typedef void* Any;

typedef struct {
    Any* items;
    int count;
} List;

List* list_new(){
    List* list = malloc(sizeof(List));
    list->items = NULL;
    list->count = 0;
    return list;
}

void list_add(List* list, Any element){
    int count = list->count + 1;
    list->items = realloc(list->items, count * sizeof(element));
    list->items[count-1] = element;
    list->count = count;
}

void list_free(List* list){
    if (list){
        if (list->items) free(list->items);
        free(list);
    }
}

void list_each(List* list, void (*callback)(List*, Any, int)){
    for (int i = 0; i < list->count; i++){
        callback(list, list->items[i], i);
    }
}

// Sample usage

#include "stdio.h"

void debug_item(List* list, Any item, int index){
    if(index > 0) printf(", ");
    printf("\"%s\"", item);
}

void debug(List* list){
    printf("%i : [", list->count);
    list_each(list, debug_item);
    printf("]\n");
}

int main(){
    List* cats = list_new();
    debug(cats);
    list_add(cats, "Baltazar");
    list_add(cats, "Scar");
    list_add(cats, "Garfield");
    debug(cats);
    list_free(cats);
}
Hr. Rabe
  • 47
  • 5
  • 2
    `malloc(sizeof(NULL))` will fail miserably if `sizeof(NULL) < sizeof(void *)`. A safer idiom is `T *p = malloc(sizeof *p)` for some type T. Anyway, this 0-terminated array thing is very ugly, error-prone, and linear-time length querying can become a performance problem for large arrays. The usual practice when implementing a dynamic array is to maintain a base pointer and an explicit size. – The Paramagnetic Croissant Dec 31 '14 at 00:24
  • Hey, thanks for the comment. 1. Why would ```malloc(sizeof(NULL))```fail? Basically I do only need the space to store a integer 0 at the end of the stream, right? No matter the actual datatype. 2. I buy the performance argument, that makes sence, thanks. 3. Can you elaborate a bit more on the *error phrone* bit? What are the errors I could expect? Why is it so ugly? – Hr. Rabe Dec 31 '14 at 00:26
  • (Not the original replier) I think in most implementations of vector, the capacity is the power of 2 >= the size. You may want to realloc in a similar fashion. I can see the error-prone issue, if you do for some reason need to store a NULL. In a large project, this would probably happen due to someone's carelessness. The performance, too, if it matters in your application. Seems like you could also use a stack, and keep null at the bottom, then just find it when you want to read from the bottom, if you are not wanting to store another pointer/var and will not be seeking the bottom often. – dwn Dec 31 '14 at 00:45
  • This will be deadly slow. Common vector implementations in C do a couple of things: first, keep the current allocation size, current occupied size, a pointer to the memory block, and other data in a struct. Also, they initially allocate some small amount of space like for 8 or ten items, then realloc more when needed rather than every time. A common strategy is to grow 1.5X when needed, that way performance on inserts scales well. – Lee Daniel Crocker Dec 31 '14 at 01:24
  • @Hr.Rabe "Basically I do only need the space to store a integer 0 at the end of the stream, right? No matter the actual datatype." - no, that's wrong. In C, arrays are homogenous. If you are 0-terminating an array of `T`s, then you need to terminate it with `(T)0`. "Can you elaborate a bit more on the error phrone bit? What are the errors I could expect?" - for one, you may forget the 0-terminator. (People forget to NUL-terminate their strings all over the place, even experts.) "Why is it so ugly?" - because it breaks symmetry, and you lose part of the domain… – The Paramagnetic Croissant Dec 31 '14 at 02:21
  • @Hr.Rabe … as you can no longer store 0 in the array, even though it's almost always desirable to be able to do so. What if you want an array of integers? Do you want `{ 1, 0, 2 }` to be invalid? Surely not. – The Paramagnetic Croissant Dec 31 '14 at 02:22

2 Answers2

2

Your array_length is O(N). This will lead to quadratic runtimes if you add lots of items in an array, which can be quite slow.

What you can so instead is use a struct to store the size of the array

struct dyn_array {
    size_t array_size; // how many elements fit in the array?
    size_t nelems;     // how much array space are we using now? 
    int *elems; //or void*, you know the drill...
}

One of the benefits of this approach is that you don't need to use the NULL terminator anymore. It also lets you use arrays of ints (or whatever else) instead of just arrays of pointers.

hugomg
  • 68,213
  • 24
  • 160
  • 246
  • This approach could be expanded to keep track of the *allocated size* versus the currently *in use size*, so that it's not necessary to call `realloc` every single time an item is added to the array. – user3386109 Dec 31 '14 at 01:04
  • How would the start value of ```dyn_array.elems``` look like then? As far as I understood, I can only use ```realloc``` on pointers that do actually occupy stack space (read: Have been created via ```malloc``` at some point) anyway? Otherwise at least Clang gives me a nice error of: *pointer being realloc'd was not allocated*... So: How would you initialize such a structed array? – Hr. Rabe Dec 31 '14 at 01:21
  • The usual strategy is to keep allocated size and filled size, then whenever you add an item when full, realloc to 1.5X allocated size. This keeps insert performance in the N*log(N) range as you scale. – Lee Daniel Crocker Dec 31 '14 at 01:26
  • @Hr.Rabe: Use malloc in `array_new` and realloc in `array_add`, just like you re doing now. @Lee: That strategy is actually O(N) amortized, so its better than you say. You ca use different multipliers if you want though (time/space tradeoff) – hugomg Dec 31 '14 at 01:31
  • If I get you right, that implies, that there is basically no meaningful way in C, to start an empty array of with actually no elements in memory. I would (If I understood correctly) always need at least a NULL element, only with the ```struct``` approach I would be able to fill that on the first call to ``àrray_add``. Correct? – Hr. Rabe Dec 31 '14 at 01:42
  • Instead of mallocing a size-0 array you can initialize elems to NULL (the null pointer, not an array contining a null pointer). `realloc` works just fine if you pass NULL to it. – hugomg Dec 31 '14 at 01:52
0

This is what the Croissant probably meant:

#define NULL 0 // in some include file.

This is unusual, but possible. Now assume 0 is a 4-byte int and void**array is an 8-byte pointer:

void* array_new(){
    void** array = malloc(sizeof(NULL)); // this works
    array[0] = NULL;                     // this write 8 bytes
    return array;
}

array[0]=NULLinterprets 0 as an 8-byte pointer and writes it into the allocated 4 bytes and beyond.

Hans Klünder
  • 2,176
  • 12
  • 8
  • So what you are saying is, that ```sizeof(NULL)``` can actually give a size different to what ```NULL``` has been defined to mean? Can this really happen? oO – Hr. Rabe Dec 31 '14 at 00:53
  • Maybe NULL has no meaning. Stroustrup strongly advised not to use anything that looks like "NULL" and is defined in an include file. He says if you need a 0, write 0, and if you need a 0 pointer, write 0. If he would be forced to define NULL, it would be `define NULL 0` which, for me, is quite reasonable. And 0 can always be interpreted as a pointer, which may change the size. Thus in the instance you assign NULL to a variable of pointer type, it changes it's type and size. – Hans Klünder Dec 31 '14 at 00:58
  • @Hans_Klünder: it's not really appropriate to apply Stroustrup's reasoning here because we're talking about C, not C++ and NULL is subtly different between the two languages. – Rufflewind Dec 31 '14 at 01:12
  • @HansKlünder That's a bit misleading. Stroustrup advised against `NULL` in favour of 0 in C++, and C++ 11 now has `nullptr`. Besides Kernighan and Ricthie used `NULL`. – ryanpattison Dec 31 '14 at 01:12
  • 3
    Yes, K&R use NULL, and it's a pointer. stroustrup prefers 0 for C++. I think it's an unnecessary risk to make the integrity of the memory depend on sizeof(NULL) == sizeof(void*), while it's much safer to write void** array = malloc(sizeof(*array)) - this is what the Croissant suggested to do, and I agree. It's a good habit which makes sure that the memory returned by malloc is large enough to store one item of the type being used in that line. – Hans Klünder Dec 31 '14 at 01:19
  • Thanks. :) That line ```void** array = malloc(sizeof(*array))``` was exactly what I was missing. ;) – Hr. Rabe Dec 31 '14 at 01:38
  • @Hr.Rabe "[…] that `sizeof(NULL)` can actually give a size different to what `NULL` has been defined to mean? Can this really happen?" – no, that cannot happen. `sizeof(NULL)` will give the size of whatever `NULL` is defined to. What **can** happen though, is that `sizeof(NULL) != sizeof(void *)`. Probably because `NULL` is the literal 0 integer and not `(void *)`. That's all I said, nothing more, nothing less. – The Paramagnetic Croissant Dec 31 '14 at 02:31