13

I know the WAV file format uses signed integers for 16-bit samples. It also stores them in little-endian order, meaning the lowest 8 bits come first, then the next, etc. Is the special sign bit on the first byte, or is the special sign bit always on the most significant bit (highest value)?

Meaning:
Which one is the sign bit in the WAV format?

++---+---+---+---+---+---+---+---++---+---+---+---+---+---+---+---++
|| a | b | c | d | e | f | g | h || i | j | k | l | m | n | o | p ||
++---+---+---+---+---+---+---+---++---+---+---+---+---+---+---+---++
--------------------------- here -> ^ ------------- or here? -> ^

i or p?

Leo Izen
  • 4,165
  • 7
  • 37
  • 56
  • 2
    As for your picture, it depends on how you draw the bits within the byte. :P If you draw them like a big-endian (high bit first), then it'll be "i". If you embrace the little-endianness of it all, then "p". Either way, it'll be the high bit of the last byte. – cHao Oct 09 '10 at 00:13

2 Answers2

19

signed int, little endian:

byte 1(lsb)       byte 2(msb)
---------------------------------
7|6|5|4|3|2|1|0 | 7|6|5|4|3|2|1|0|
----------------------------------
                  ^
                  | 
                 Sign bit

You only need to concern yourself with that when reading/writing a short int to some external media. Within your program, the sign bit is the most significant bit in the short, no matter if you're on a big or little endian platform.

nos
  • 223,662
  • 58
  • 417
  • 506
15

The sign bit is the most significant bit on any two's-complement machine (like the x86), and thus will be in the last byte in a little-endian format

Just cause i didn't want to be the one not including ASCII art... :)

+---------------------------------------+---------------------------------------+
|              first byte               |              second byte              |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
|  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 | 12 | 13 | 14 | 15 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
   ^--- lsb                                               msb / sign bit -----^

Bits are basically represented "backwards" from how most people think about them, which is why the high byte is last. But it's all consistent; "bit 15" comes after "bit 0" just as addresses ought to work, and is still the most significant bit of the most significant byte of the word. You don't have to do any bit twiddling, because the hardware talks in terms of bytes at all but the lowest levels -- so when you read a byte, it looks exactly like you'd expect. Just look at the most significant bit of your word (or the last byte of it, if you're reading a byte at a time), and there's your sign bit.

Note, though, that two's complement doesn't exactly designate a particular bit as the "sign bit". That's just a very convenient side effect of how the numbers are represented. For 16-bit numbers, -x is equal to 65536-x rather than 32768+x (which would be the case if the upper bit were strictly the sign).

cHao
  • 84,970
  • 20
  • 145
  • 172
  • 3
    If you're reading actual bytes rather than an ASCII representation, then yes -- the lower 8 bits (and thus, the LSB) will *always* be the first byte, and the upper 8 (MSB included) will *always* be the last -- that's what "little endian" means. But the bits within the byte are stored however the hardware wants to store them. You don't have to know or care how that works unless you're building a hard drive. – cHao Oct 08 '10 at 21:12
  • I'm talking about how it is stored in the WAV file format, not in hardware. I read that the samples are stored as little endian signed 16-bit integers. – Leo Izen Oct 08 '10 at 22:49
  • They are. And that just happens to be quite close to the way x86 processors handle 16-bit values, which is why they're stored that way. (Windows was made for x86-compatible CPUs.) Either way, the sign bit is the most significant bit of the last byte of the value, as mentioned above. If `byte2 & 0x80` is nonzero, then your number is negative. – cHao Oct 08 '10 at 23:59
  • Don't get too caught up in whether the bits go from "left to right" or "right to left" or whatever. Bytes are bytes, and they'll have the same value no matter how they're drawn. Only the hardware needs to know about the order of bits in a byte. The big thing you have to remember is that the least significant *byte* comes first, and the most significant *byte* is last. The most significant bit of the whole value will always be the most significant bit of the most significant byte. – cHao Oct 09 '10 at 00:11
  • Should I expect the sign bit to be on the most significant byte, and most significant bit, as well in big endian architectures? – ceztko Oct 12 '20 at 20:33
  • 1
    @ceztko: Depends on perspective. It'll be the most significant bit of the most significant byte *of the representation*, which is exactly where the difference between big- and little-endian creeps in and complicates things. If you don't want to decode the representation into a value useful for your system (ie: swap bytes around as needed), you need to know about the representation's endianness to know where the most significant byte is. – cHao Oct 19 '20 at 11:40
  • @cHao yes, I'm talking about the representation. In the end I just want to know if swapping bytes to read/write a signed integer in big-endian format will be the same as for unsigned integers, and the answer to me appear to be yes. – ceztko Oct 19 '20 at 22:22
  • @ceztko: Yes, signed integers will convert largely the same as unsigned ones. In fact, at the byte-sequence level, signed and unsigned values are basically indistinguishable; little-endian `0xfe 0xff` and big-endian `0xff 0xfe` represent both -2 _and_ 65534. The difference between them is entirely in semantics and handling, which you only have to care about once you've read in the value. – cHao Oct 22 '20 at 22:03