0

In C language, I have a piece of program like

void foo(A a);

here, the type A is known. It can be any type, int, pointer, or struct, or any user-written type.

Now I have a piece of data pointed by a pointer, say uint8_t* data of size n How can I convert data to a of type A?

I am working on this to test foo, from a random data of type uint8_t* and size n, using a fuzzing backend.

zell
  • 9,830
  • 10
  • 62
  • 115
  • 1
    "How can I __convert__ data to a of type A?" is impossible to do generically. To "convert" data, you have to know the destination type, it's storage format (destination type object representation) and the source format. You can "interpret" data as the other type, just by using a pointer or `memcpy`, but that is without any conversion and it may cause strange results to happen including undefined behavior. – KamilCuk Nov 22 '20 at 11:37
  • That makes sense. The type A is known actually. I updated the post. Thank you! By the way, "interpret" means cast? – zell Nov 22 '20 at 11:45
  • Then how is the representation of A encoded/serialized to `uint8_t*`? Ie. what is A and how is it stored in `uint8_t*`? Without any rules, it's impossible to "guess" what the rules are. Could give an example, with real life `uint8_t*` data and sample `A` type? C does not have reflection. `"interpret" means cast?` it would mean cast+dereference, but casting may be wrong, because pointer alignment and size may differ.. – KamilCuk Nov 22 '20 at 11:45
  • I do not understand your last question. Suppose A is a struct of type "double x; int y", it takes 4+2 bytes. I want to get 6 bytes from the data unit8_t and interpret them as the struct. – zell Nov 22 '20 at 11:48
  • `it takes 4+2 bytes.` no, it doesn't, there is padding. Then take 4 bytes and 2 bytes and reconstruct the values. `memcpy(&t->x, arr, 4); memcpy(&t->y, arr+4, 2);` but that still _depends on how the data were serialized to arr_ and may result in trap representation. Why not like `t->x = arr[0]*256+arr[1]; t->y=arr[4]<<8+arr[5];`? – KamilCuk Nov 22 '20 at 11:49
  • Ah, no idea about "padding". Then I mean to reinterpret the `uint_8*` as the struct, no matter how many bytes it actually take. Does that make sense? ok. Just saw your additional comment. It seems more complex than I thought. – zell Nov 22 '20 at 11:51
  • Well, no it doesn't. If you want to construct a `double` from 4 random bytes, why not create a 32-bit integer and divide it `(double)(arr[0]<<24|arr[1]<<16|arr[2]<<8|arr[3])/UINT32_MAX`? You _may_ interpret the bytes _as double_, but that might result in an "invalid" value and break strict aliasing and result in an unaligned access - ie. when doing `*(double*)arr`. – KamilCuk Nov 22 '20 at 11:54
  • "Double" was a simplified example. Since type A can be anything, such as robot_arm*, I guess your proposed memcpy is a generic solution. Like shown here https://www.tutorialspoint.com/c_standard_library/c_function_memcpy.htm for a 'str' case (conversion to C strings). No? – zell Nov 22 '20 at 12:06

2 Answers2

1

Lets say you read bytes (uint8_t) from a stream and want to pass the data to your function foo.

The steps to follow:

  • are you sure you read serialized information of your datatype A?
  • are you sure to have read at least sizeof(A) bytes?
  • are you sure your type A is (trivially) serializable? (e.g. what if A contains a pointer to another object)

then

foo((A) data); // <- remember: A is just a placeholder, but data is a pointer to uint8_t
Erdal Küçük
  • 4,810
  • 1
  • 6
  • 11
  • Thank you! How would i know if "type A is serializable"? Serialization looks like a hard technical word for me. Could you elaborate on that by editing your answer? Again, thanks! – zell Nov 22 '20 at 12:11
  • 1
    Everything is serializable, serialization means that you serialize your object representation into a sequence of bytes, which can then be written to a file or send over a network. What i meant was, if you have a sequence of bytes, make sure you know, that that sequence contains serialized data of A. Serialization can be trivial (if a data struct contains only primitive types) or more complex, if, like mentioned, your struct contains a pointer to another object. You have to add that info to the sequence, so that you know what to do on the other side. – Erdal Küçük Nov 22 '20 at 12:24
1

Convert uint8_t* to any type in C?

Is not possible to do generically in C language. C language doesn't have reflection, and without it nothing can be said about "any type". Without knowing the "any type" object representation and without knowing the serialization method used to encode that object in an pointer to/array of uint8_t objects, it's not possible to generically auto-guess a conversion function.

You may interpret the set of bytes pointed to by uint8_t*. Aliasing with a pointer will result in strict alias violation and access may not be aligned and may ultimately lead to undefined behavior. You could alternatively use memcpy (and this is most probably what you want actually to do):

void foo(A a, size_t arrsize, uint8_t arr[arrsize]) {
    assert(arrsize >= sizeof(A)); // do some rudimentary safety checks
    memcpy(&a, arr, sizeof(A));
    // use a
    printf("%lf", a.some_member);
}

or use union to do type-punning, but that may result in a trap representation and may cause program to perform a trap, but ultimately you could be fine.

The only proper way to actually convert an array of values to the destination typeis to actually write a deserialization/conversion function. The algorithm will depend on the object representation of the A type and the format and encoding of the source type (json? yaml? "raw"(?) bytes in big endian? little endian? MSB? LSB? etc..).

Note that uint8_t represent a number that takes exactly 8 bytes, has a range of 0 to 255. In C to represent a "byte" use unsigned char type. unsigned char is specifically mentioned to have the smallest alignment requirement, sizeof equal to 1 and you can alias any object with a char* pointer.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • Thank you for your detailed answer . What do you mean by "interpret" exactly? I thought you meant by "cast" but apparently it is not the case after I read through your answer. – zell Nov 22 '20 at 12:24
  • 1
    By "interpret" I believe I mean the normal english meaning. You have 4 bytes 0x01 0x02 0x03 0x04. That may be interpret as a little endian 32-bit number equal to 16909060, it may be intepreted as a big endian 32-bit number equal to 67305985, it may be interpreted as two 16-bit big endian numbers 258 and 772 or it may be interpreted as a IEEE-745 double precision number equal to 2.38793926059e-38 or etc. Cast+dereference are part of C langauge and it's a way in that language that allows to interpret same bytes as different objects. – KamilCuk Nov 22 '20 at 12:28