3

I have a struct as follows:

typedef struct Node {
    void* data;
    unsigned long id;
    NodePtr next;
    NodePtr prev;
} Node;

It is meant to be a node in a linked list ADT. I have 2 different constructors depending on what the Node needs to hold in data. One constructor uses:

NodePtr TempNode;
TempNode = malloc( sizeof(Node) );
/* Other stuff */
TempNode->data = newList();
return (TempNode);

And this seems to work just fine for letting me access that list by returning (List->current->data) where current is a Node pointer in the List Struct

However, I want to make a version of the constructor where (data) points to an int. I've read that I can do this by doing the following

void* ptr;
int x = 0;
*((int*)ptr) = x;

But with the way my constructor is set up, that would mean I have to do something like this?

*((int*)TempNode->data) = 1; // data starts at 1

And this doesn't work. I'm very new to C so I don't understand much of the terminology. I read that dereferencing (using the -> notation?) cannot be done with void*'s, but it seems to work fine in my list version of the constructor. How can I rewrite my other constructor to cast this void* to an int?

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Ethan Lie
  • 41
  • 1
  • 2
  • 1
    The pointer *needs* to point at some specific storage that you are allowed to use. It can be automatic, static, threat-local, or dynamic, but you *may not* just use whatever bit of memory it happens to point at! – dmckee --- ex-moderator kitten May 19 '13 at 21:53
  • But doesn't that mean my calls to List->current->data shouldn't return a List for the case where data was set to = newList()? It seems like they do. – Ethan Lie May 19 '13 at 21:55
  • No, because in that case you *set* `data`. In the other case you are just accepting whatever value happened to be in the memory that was handed you for your `Node` (which could be literally *any* value expressible in `sizeof(void*)` bytes). – dmckee --- ex-moderator kitten May 19 '13 at 22:01

3 Answers3

2

I strongly counsel against doing this, but if you really want to use the void * member to hold an integer, you can do:

Node *constructor_int(int n)
{
    Node *tmp = malloc(sizeof(*tmp));
    /* Other stuff */
    tmp->data = (void *)n;
    return(tmp);
}

This involves the minimum number of casts and avoids most problems with relative sizes of types.

The obvious, logical way to do it is to allocate an integer for the data member to point at:

Node *constructor_int(int n)
{
    Node *tmp = malloc(sizeof(*tmp));
    /* Other stuff */
    tmp->data = malloc(sizeof(int));
    *(int *)temp->data = n;
    return(tmp);
}

You just have to remember to free the two memory allocations.

The code should also check that the memory allocations succeeded before using the results.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • +1 I expected to see `intptr_t` come into this answer somewhere, but the more I thought about it the best "do it right" answer is the closing double alloc. One may be tempted to marry the two into a single alloc (sizeof(int) + sizeof(*tmp)) and set the relative offsets, but there are platforms where that will cause alignment errors (not many, but some). Nice answer. – WhozCraig May 19 '13 at 22:04
  • Thank you. Does this mean I can treat NodePtr->data as an int now? and in other areas of code call List->current->data++? – Ethan Lie May 19 '13 at 22:17
  • You have to treat it as a `void *` that can legitimately be cast to `int *` and dereferenced. Hence `(*(int *)ModePtr->data)++` and `(*(int *)List->current->data)++` would be be OK. I'd probably work to avoid needing to do this very often — I'd look at the design of the whole system and see whether there's a better way to do things — possibly even moving away from a single ADT to use one ADT for the integer list and another ADT for the one used with `newList()` (where the abstractness gets called into question). If you really want template types, maybe you should consider C++. – Jonathan Leffler May 19 '13 at 22:23
  • Thank you very much, I think I understand now. I agree with you about the ADT split, that's probably what I would do if I had more time. – Ethan Lie May 19 '13 at 22:32
1

Let's talk about this

When you do something like

NodePtr TempNode;
TempNode = malloc( sizeof(Node) );

you have asked the library to reseve you some dynamic storage which is big enough for a Node. The initial values of that memory are undefined, so right now the pointer TempNode->data could point anywhere and probably does not point at memory reserved for your use.

When you do

TempNode->data = newList();

you give the pointer a (presumably, as long as newList() does something legal and sensible) valid value;

If instead you do

*((int*)TempNode->data) = 1;

you instruct the compiler to

  1. treat TempNode->Data as a pointer-to-int,
  2. de-reference it and
  3. set the value of what every memory is at the other end to 1 (notice that at no point have you set data itself, just whatever it points at)

But you don't know what it points to! De-referencing it is undefined behavior and strictly a bad thing.

You are always responsible for ensuring that you do not de-reference a pointer unless it points to memory that you are entitled to use.

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
  • Oh, I think I understand what you're saying, but I'm still not sure what the best course of action is. How can I make data point to an area I can use? – Ethan Lie May 19 '13 at 22:11
  • Local variable, global variable, dynamic allocations (the return values from the `alloc` family of function) are all bits you can use (at least until the local variables go out of scope or you `free()` dynamic allocations). – dmckee --- ex-moderator kitten May 19 '13 at 22:24
1

"How can I make data point to an area I can use?"

I'm not sure if you mean what I'm going to explain below (and it won't be short :P), but if you are asking how you can distinguish the type of the data stored in Node->data, then with that implementation you cannot.

You leave it up to the end-programmer to remember what type of data he has stored into the list (which is not a bad thing btw.. on the contrary, it is the norm). In other words, you trust the end-programmer that he/she will apply the proper casts when say printing the Node->data.

If for some reason you wish to provide a more managed API for your list, you could add one more field in a List struct, for identifying the type of the data stored in your list.

For example...

enum DataType {
    DT_INVALID = 0,
    DT_PTR
    DT_CHAR,
    DT_INT,
    DT_FLOAT,
    DT_DOUBLE,
    ...
    /* not a data-type, just their total count */
    DT_MAX
};
#define DT_IS_VALID(dt)    ( (dt) > DT_INVALID && (dt) < DT_MAX )

typedef struct List List;
struct List {
    enum DataType dt;
    Node  *head;
};

Of course you are free to support less or more data-types than the ones I listed in the enum, above (even custom ones, say for strings, or whatever you see fit according to your project).

So first you'll need a constructor (or initializer) for a list, something like this...

List *new_list( enum DataType dt )
{
    List *ret = NULL;

    if ( !DT_IS_VALID(dt) )
        return NULL;

    ret = malloc( sizeof(List) );
    if ( NULL == ret )
        return NULL;

    ret->dt = dt; 
    ret->head = NULL;

    return ret;
}

and instantiate it, say like this...

int main( void )
{
    List *listInt = new_list( DT_INT );
    if ( NULL == list ) {
        /* handle failure here */
    }

Now that you have already stored the intended data-type into the list meta-data, you are free to choose how you will implement the Node constructors. For example, a generic one could look something like this...

int list_add_node( List *list, const void *data )
{
    Node *node = NULL;
    size_t datasz = 0;

    /* sanity checks */
    if ( !list || !data || !DT_IS_VALID(list->dt) )
        return 0;  /* false */

    node = malloc( sizeof(Node) );
    if ( NULL == node )
        return 0; /* false */

    /* when data points to mem already reserved say for an array (TRICKY) */
    if ( DT_PTR == list->dt ) {  
        node->data = data;
    }
    /* when data points to mem reserved for a primitive data-type */
    else {
        datasz = dt_size( list->dt );  /* implement dt_size() according to your supported data-types */
        node->data = malloc( datasz );
        if ( NULL == node->data ) {
            free( node );
            return 0; /* false */
        }
        memcpy(node->data, data, datasz );
    }

    /* add here the code dealing with adding node into list->head */
    ...
    return 1; /* true */
}

For DT_PTR (which I flagged as TRICKY in the example) it is more safe to implement a different Node constructor, perhaps accepting 2 extra arguments, lets say elemsz and nelems, so the function can allocate elemsz * nelems bytes for copying the data contents into them, in case data points to an array, a struct or any other non-primitive type. Or you can provide an extra DT_ARR enum value specifically for arrays. You are free to do whatever suits you best.

In any case, for DT_PTR the example above relies on the caller of list_add_node to have properly allocated the passed data, and in general context this is not a good thing at all.

The code is more complicated, but you know the data-type stored in your list. So for at least the primitive data-types you can add say a printing routine that automatically casts its output according to list->dt (for non-primitive data types you should provide support for custom printing routines, usually via callback functions).

You can even take it to the extreme, and move the dt field from List to Node. In that case you implement a list of heterogeneous data in the nodes, but it gets much more complicated and also its rarely useful (if ever).

All this ADT stuff via (void *) pointers have serious performance issues, that's why speed critical implementations utilize (or abuse if you prefer) the pre-processor instead, for this kind of stuff.

Harry K.
  • 560
  • 3
  • 7
  • Wow, thank you for the indepth explanation. I know that my way of doing it isn't optimal, and leaves room for error. Luckily it seems that Jonathan Leffler's explanation seems to work for my purposes. When I get a chance I'll sit down and really take a look at your post so I can really understand whats going on. Thanks again! – Ethan Lie May 20 '13 at 01:16
  • You are welcome (btw I just fixed some typos, there may be more though). Since Jonathan's answer was the most helpful, consider accepting it when your rep goes a bit higher ;) – Harry K. May 20 '13 at 01:26