0

I have a function which reads a binary file into memory as type void *. Information in the file header indicates the amount of memory required and the actual data type (in bytes per number - eg. 8 if it should be interpreted as "long".

My problem is, main has no knowledge of the data type or memory required. So I call the function like this:

long myfread(char *infile, void **tempdata,*datasize) 

char *infile="data.bin"; // name of the input file
void *tempdata=NULL; // where the data will be stored, initially 
long n; // total numbers read, returned by the function 
size_t datasize; // modified appropriately by the function 

n = myfread(infile,&tempdata,&datasize);

So far so good - main can read the bytes in "tempdata" - but not as (say) integers or floats. My question is, is there a simple way to recast tempdata to make this possible?

JWDN
  • 382
  • 3
  • 13

6 Answers6

1

I think that you are not talking about array, but a block of memory.

A pointer, no matter it's void *, char * or int *; when it pointed to an address of memory(may be virtual, mostly on the heap), the difference is only how it is interpreted.

Say you have 16 bytes of memory block, for byte[] you got 16, for int[](per 32 bits) your got 4, and so on. When you applied the index to it, the increment of byte offset is according to the size of the data type.

The most important thing is, the integrity of the memory block to your data type. That is, you should not access a location which exceed the size of the memory block. Say you have 10 bytes of memory and you pointer is int *a, then accessing of a[1] is just access violation.

Can I re-cast an entire array from *void to *int?

I believe there's no such thing of a void array. For the casting of pointer types, you are free to do so in C.

Ken Kin
  • 4,503
  • 3
  • 38
  • 76
1

Ok, so myfread looks something like this:

long myfread(char *infile, void **data, size_t *datasize)
{
   FILE *f = fopen(infile, "rb");   // Or some such.  
   ... 

   *datasize = ... // some calculation of some sort, e.g. seek to end of file?

   *data = malloc(*datasize ... );   // Maybe more calculation? 

   res = fread(f, data, datasize); 

   fclose(f);

   return res;
}

And then later, you want to convert the updated *data as an int *?

int *my_int_array; 

n = myfread(infile,&tempdata,&datasize);

my_int_array = tempdata;   // If a C++ compiler, you need a cast to (int *)

for(int i = 0; i < datasize; i++)
{
   printf("%d\n", my_int_array[i]); 
}

Of course, if myfredad doesn't do what I think it does, all bets are off.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Thank-you - I'm sensing a convergence of advice from all sides :) datasize is teh number of bytes-per-number incidentally, not the number of numbers, but your main point still applies. – JWDN Jun 07 '13 at 23:36
  • Well, I was GUESSING what your function does... ;) – Mats Petersson Jun 07 '13 at 23:42
  • good guess! the function is a helluva lot more complex of course but I try not to provide more details than necessary. In any case your answer should share credit for being accepted. – JWDN Jun 07 '13 at 23:51
1

Based on your edited question, I can make a guess as to what myfread looks like. Simplified tremendously, it does something like this:

long myfread(const char *path, void **pmem, size_t *datasize) {
    long magically_found = 42;
    int *mem;
    int i;

    mem = malloc(magically_found * sizeof(int)); /* and we assume it works */
    *datasize = 12345;
    for (i = 0; i < magically_found; i++)
        mem[i] = i;
    *pmem = mem;
    return magically_found;
}

Now, in your main, you have to somehow know that if datasize == 12345 upon return, the allocated memory has been filled with ints. Knowing this, you then simply write:

    int *ip;
    ... /* your code from above, more or less */
    if (datasize != 12345) {
        panic("memory was not filled with ints");
        /* NOTREACHED */
    }
    ip = tempdata;

From here on you can access ip[i], for any valid i (at least 0 and less than n).

The tougher question is, how do you know that 12345 means int and what the heck do you do if it's not 12345? And, probably 12345 does not mean int anyway. Maybe 4 means int or float which both happen to have a sizeof of 4, in which case, having datasize == 4 does not tell you which one it is after all! So, then what?

All in all, it sounds like the question is underspecified, at least.

torek
  • 448,244
  • 59
  • 642
  • 775
  • The user is presumed to know whether a 4-byte number is meant to be interpreted as an int or a float - for the purposes of the question we can assume it's not a problem. Let me have a go at your suggestion and I'll report back.... – JWDN Jun 07 '13 at 23:33
  • As long as this is just an experiment, or the types are restricted, encoding types by their `sizeof` is ... not great, but workable. But in general you'd want something fancier, like an `enum` listing the possible types (with size implied by type), or a debugger-style type encoding ("stabs", dwarf2, etc). (This sort of thing is much easier in dynamically-typed languages; it would be trivial in Python for instance.) – torek Jun 07 '13 at 23:39
  • I now have a working function :) - or more importantly, a SINGLE working function, rather than having to maintain 12 separate functions for each possible data type I might encounter. Thanks! – JWDN Jun 08 '13 at 00:46
0

I'm having a hard time understanding what you want, and I think you might be too. It seems like you have a function similar to read or fread that takes an argument of type void * for where to store the data it reads. This does not mean you make a variable of type void * to pass to it. Instead, you pass the address of the object you want the data stored into.

In your case, simply make an array of int of the appropriate size and pass the address of that array (or the address of its first element) to the function that does the reading. For example (assuming fread):

int my_array[100];
fread(my_array, sizeof my_array, 1, f);

If you don't know the size in advance, or if it needs to live past the return of the calling function, you can allocate space for the array with malloc.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • I think I need to clarify the problem, based on the questions posted :) - the function which reads the file determines (and allocates) the memory required and the data type by reading information in the file header. The calling function has no access to this information, so I am passing an address to a void array to it. – JWDN Jun 07 '13 at 22:34
  • There's no such thing as an array-of-void in C, only pointer-to-void, which is used for pointing to objects of unknown or arbitrary type. – R.. GitHub STOP HELPING ICE Jun 07 '13 at 23:08
  • That's still an array, to me :) – JWDN Jun 07 '13 at 23:48
  • A pointer is never an array. It can point to an array, or you can have an array of pointers, but a pointer is not an array. Until you stop insisting on being wrong and clarify what you're doing, it's really impossible to help you further... – R.. GitHub STOP HELPING ICE Jun 07 '13 at 23:51
0
for(i = 0; i < index_max; i++) {
    printf("%d\n", ((int*)tempdata)[i]);
}
koby m.
  • 61
  • 1
  • 2
0

Yes, you can cast a pointer to another type, but it's hard to avoid undefined behavior if you do so. For example, you have to make sure the binary data you're casting is aligned correctly, and that the memory representation in the code that wrote the data is the same as the memory representation of the code that's reading it. This isn't just an academic problem, as you're likely to find endian differences across architectures, and that, for example, doubles have to be carefully aligned on ARM machines.

You can solve the alignment problems by writing functions that access the memory as if it was a typed array, using memcpy. For example,

int get_int(const char *array, int idx) {
    int result;
    memcpy(&result, array + idx * sizeof(int), sizeof(int));
    return result;
}

To avoid writing this out N times, you can macroize it.

#define MAKE_GET(T) T get_##T (const char *array, int idx) { \
    T result; \
    memcpy(&result, array + idx * sizeof(T), sizeof(T)); \
    return result; \
}

MAKE_GET(int)
MAKE_GET(float)
MAKE_GET(double)

To solve the endian problem, or more generally the problem that memory representations can differ across machines, you need to have a well-defined format for your binary file (for example, always writing ints little-endian). One good approach is to use text, (compressed with zlib or similar if you need it small). Another is to use a serialisation library (for example, Google's protocol buffers). Or you can roll your own - it's not too hard.

Paul Hankin
  • 54,811
  • 11
  • 92
  • 118