1

I have data in form of hexadecimal string and I convert it to float as:

import struct, binascii
a = '0X437A1AF6'
x = struct.unpack('>f', binascii.unhexlify(str(a)[2:]))
print(x[0])

I get the right result but How do I prove that using big endian '>f' is right choice or how do I determine what endian to use in general? Trial an error is one option but what are other?

Jakub
  • 11
  • 1
  • There is no way to tell, other than is the value reasonable. Just like text encodings, if you don't know the encoding you have to make an educated guess. – Mark Tolonen May 06 '22 at 23:36

1 Answers1

0

Endianness is how the bytes in the object are ordered. I know that you used floats in your code, but I'm using integers here for simplicity.

Big endian means that the bytes are ordered largest-to-smallest: 437a1af6 in memory would mean 43 7a 1a f6, or 1132075766.

Little endian means that the bytes are ordered smallest-to-largest: 437a1af6 in memory would mean f6 1a 7a 43, or -166036925 (when signed, or 4128930371 when unsigned).

Floating point has a specific byte ordering as well, see here. The endianness affects the byte order of the floating point representation, and it can drastically change the value that is returned.


Whichever endian you use doesn't really matter as long as you stay consistent, but in current x86 implementations, little endian is more commonly used. There is no right or wrong choice.

In your case, little endian unpacks to -7.832944125711889e+32 and big unpacks to 250.10531616210938.

Eric Jin
  • 3,836
  • 4
  • 19
  • 45
  • The correct output is 250.10531616210938 therefore using a big endian is right choice but why? For example when I reorder to '0XF61A7A43' I have to use little endian. I would like to determine what endian to use to make it automated in case I recieve data as mentioned. – Jakub May 06 '22 at 21:11
  • Automated? That won't work, what context do you need this in? You'll just have to agree on endianness to use beforehand, or include the endianness as extra data when you get sent the number. – Eric Jin May 06 '22 at 22:17
  • I recieve data as hex string and I must convert it to float to pass them into quadprog optimizer. Using incorrect endian will cause infeasibility. Right now everything is working as it should but I wanted to make it foolproof in case I recieve this '0XF61A7A43' (need to use little endain) instead of this '0X437A1AF6' (need to use big endian). I just want to know if there is a way how to find out which to use from hex string or byteorder or something instead of trial and error? Let's assume our only input is hex string without additional information. – Jakub May 06 '22 at 22:37
  • It's impossible to determine the endianness if you only know the bytes in the float. You *might* be able to sanity check both values you receive to see if eg one of them is hundreds of digits long (this is obviously really unreliable). You should see the documentation of this to find out the endianness of the bytes received. – Eric Jin May 06 '22 at 22:43
  • When using [gregstoll](https://gregstoll.com/~gregstoll/floattohex/) a little endian is right choice for '0X437A1AF6'. I think that since my system uses little, I will recieve bytes in reverse order from binascii.unhexlify(), basically turning little endian to big therefore I must use '>' in struct.unpack() but I might be wrong. – Jakub May 06 '22 at 23:45
  • That website does not look legitimate, and it outputs the same thing for both endiannesses (big). The system endianness shouldn't really matter - that only affects the (implementation-defined) storage of the C values. You're going to convert them to python anyway. I would still advise you to just stick to one endianness and use it everywhere. – Eric Jin May 06 '22 at 23:50
  • @Jakub `binascii.unhexlify` should not depend on byte order. (Same with `bytes.fromhex`.) The only endianness you really have to worry about is the endianness when the data was sent to you. The raw hex dump of the data doesn't really have endianness, only after interpreting it as ints/floats it does. – Eric Jin May 06 '22 at 23:55
  • I get different results for endians. You have to click convert to float when you check Swap to use big-endian. – Jakub May 06 '22 at 23:58
  • sorry, that was my mistake. Where are you getting this data from? Is there an endianness that it will have? – Eric Jin May 07 '22 at 00:01
  • Another calculator https://www.h-schmidt.net/FloatConverter/IEEE754.html says `250.1` is correct for big endian ordering (see the order of the bits). There shouldn't be any inconsistencies. – Eric Jin May 07 '22 at 00:09
  • I recieve them from user in XML file but I don't know how they create them. Guess I have to ask them or assume they are little endian and then check through ``` import sys sys.byteorder ``` what endian does system use and set endian for struct.unpack() according to system endian. – Jakub May 07 '22 at 00:15
  • You get *raw hex data*? Checking `sys.byteorder` only checks the system's endianness. Not the data they are giving you. You can explicitly mention something like "all binary data in this file needs to be [big/little] endian format" – Eric Jin May 07 '22 at 00:27
  • Yes, i recieve this '0X437A1AF6' and by trial and error i determined that i should use '>f'. But i wanted some proof why i must use '>f'. – Jakub May 07 '22 at 00:35
  • The proof is really just how you were given the input. If you were given it in big endian format, then you were given it in that format. Think of it like, maybe, receiving a letter where the sentences could be either forwards or backwards. Most people agree on one way, but the sentence isn't unreadable in either direction. – Eric Jin May 07 '22 at 01:20
  • Or maybe since they are in IEEE 754 format I should use big endian to read them left to right? – Jakub May 07 '22 at 01:25
  • Depends. Byteorder is system-specific I believe. Try using `ctypes` or just plain C. – Eric Jin May 07 '22 at 02:35