2

Currently, it's for a Huffman compression algorithm that assigns binary codes to characters used in a text file. Fewer bits for more frequent- and more bits for less-frequent characters.

Currently, I'm trying to save the binary code big-endian in a byte. So let's say I'm using an unsigned char to hold it. 00000000 And I want to store some binary code that's 1101. In advance, I want to apologize if this seems trivial or is a dupe but I've browsed dozens of other posts and can't seem to find what I need. If anyone could link or quickly explain, it'd be greatly appreciated. Would this be the correct syntax? I'll have some external method like

int length = 0;
unsigned char byte = (some default value);
void pushBit(unsigned int bit){
    if (bit == 1){
        byte |= 1;
    }
    byte <<= 1;
    length++;
    if (length == 8) { 
        //Output the byte
        length = 0;
    }
}

I've seen some videos explaining endianess and my understanding is the most significant bit (the first one) is placed in the lowest memory address.

Some videos showed the byte from left to right which makes me think I need to left shift everything over but whenever I set, toggle, erase a bit, it's from the rightmost is it not? I'm sorry once again if this is trivial.

So after my method finishes pushing the 1101 into this method, byte would be something like 00001101. Is this big endian? My knowledge of address locations is very weak and I'm not sure whether **-->00001101 or 00001101<-- ** location is considered the most significant. Would I need to left shift the remaining amount?

So since I used 4 bits, I would left shift 4 bits to make 11010000. Is this big endian?

Chen Li
  • 4,824
  • 3
  • 28
  • 55
Ji Kang
  • 31
  • 4
  • 2
    `byte <= 1;` doesn't do anything. And endianess operates on a byte level, not a bit level. – tkausl May 08 '18 at 04:44
  • Why do you want to save big endian? In which case will have the endianess an effect for what you are trying to achieve? – Yunnosch May 08 '18 at 04:45
  • *So let's say I'm using an unsigned char to hold it. 00000000* Your `unsigned char byte` is uninitialized. – 273K May 08 '18 at 04:46
  • @S.M. The `unsigned char byte;` looks gloabl to me. – Yunnosch May 08 '18 at 04:47
  • Doh. Sorry. I meant <<=1 for the left shifting. And please assume the unsigned char byte is set to a default value. – Ji Kang May 08 '18 at 04:47
  • 2
    Endianness is about the order of bytes not bits – Killzone Kid May 08 '18 at 04:48
  • If you do not have very special reasons to worry abot endianess, don't worry. The compiler knows the hardware and both together will not make a mistake. Only if you have unusual assumptions and expectations the endianess will matter. – Yunnosch May 08 '18 at 04:49
  • It's currently for a class project. It states the "bits will be 'big endian' within the byte". – Ji Kang May 08 '18 at 04:49
  • @KillzoneKid Opinions diverge on that. There are compilers which turn `{bitA:4; bitB:4;} var.bitA= 1;` into 16, others into 1. I have seen both and consider it bit endianess. – Yunnosch May 08 '18 at 04:52
  • I recommend asking the author of ""bits will be 'big endian' within the byte"" about what they mean. I think they refer to the "16" != "1" difference between compilers I mentioned above, but that does not have an influence on huffman encoding. – Yunnosch May 08 '18 at 04:55
  • 1
    What you call 'bit endianess' has nothing to do with endianess at all. Just how the compiler choose to lay out the bitfields. – tkausl May 08 '18 at 04:56

3 Answers3

4

First off, as the Killzone Kid noted, endianess and the bit ordering of a binary code are two entirely different things. Endianess refers to the order in which a multi-byte integer is stored in the bytes of memory. For little endian, the least significant byte is stored first. For big endian, the most significant byte is stored first. The bits in the bytes don't change order. Endianess has nothing to do with what you're asking.

As for accumulating bits until you have a byte's worth to write, you have the basic idea, but your code is incorrect. You need to shift first, and then or the bit. The way you're doing it, you are losing the first bit you put in off the top, and the low bit of what you write is always zero. Just put the byte <<= 1; before the if.

You also need to deal with ending the stream somehow, writing out the last bits if there are less than eight left. So you'll need a flushBits() to write out you bit buffer if it has more than one bit in it. Your bit stream would need to be self terminating, or you need to first send the number of bits, so that you don't misinterpret the filler bits in the last byte as a code or codes.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • I just tried it out in visual studios and it seemed to work doing or-equaling it first then shifting but shifting and then or-equaling also seemed to work. Thanks for the suggestion though! – Ji Kang May 08 '18 at 06:38
  • Actually. Second correction. It did make a tremendous difference! I was confused earlier when you were explaining it but as I printed out bit-by-bit (using int 0 and 1 representations), i noticed it there. I know I'm not supposed to use the comments to just say thanks but kudos to you sir. Can I ask why the "low bit" is always zero? – Ji Kang May 08 '18 at 07:09
  • Hopefully the low bit is no longer zero because you fixed it. The low bit _was_ zero because the last you thing did was `byte << 1`. A shift left always leaves the low bit(s) zero. That's how the operation is defined. – Mark Adler May 08 '18 at 16:10
2

There are two types of endianness, Big-endian and Little-endian (technically there are more, like middle-endian, but big and little are the most common). If you want to have the big-endian format, (as it seems like you do), then the most significant byte comes first, with little-endian the least significant byte comes first.

Wikipedia has some good examples

It looks like what you are trying to do is store the bits themselves within the byte to be in reverse order, which is not what you want. A byte is endian agnostic and does not need to be flipped. Multi-byte types such as uint32_t may need their byte order changed, depending on what endianness you want to achieve.

Maybe what you are referring to is bit numbering, in which case the code you have should largely work (although you should compare length to 7, not 8). The order you place the bits in pushBit would end up with the first bit you pass being the most significant bit.

Chen Li
  • 4,824
  • 3
  • 28
  • 55
dempzorz
  • 1,019
  • 13
  • 28
  • so 00001101 would mean the left most 1 is in the most significant position? Do I need to shift this to the left like I mentioned to make it 11010000? Thanks for responding by the way. – Ji Kang May 08 '18 at 06:39
  • byte is it is isn't endian agnostic, but C++'s `char` and its `<<` , `>>` operators are. – Swift - Friday Pie May 08 '18 at 07:06
0

Bits aren't addressable by definition (if we're talking about C++, not C51 or its C++ successor), so from point of high level language, even from POV of assembler pseudo-code, no matter what the direction LSB -> MSB is, bit-wise << would perform shift from LSB to MSB. Bit order referred as bit numbering and is a separate feature from endian-ness, related to hardware implementation.

Bit fields in C++ change order because in most common use-cases usually bits do have an opposite order, e.g. in network communication, but in fact way how bit fields are packed into byte is implementation dependent, there is no consistency guarantee that there is no gaps or that order is preserved.

Minimal addressable unit of memory in C++ is of char size , and that's where your concern with endian-ness ends. The rare case if you actually should change bit order (when? working with some incompatible hardware?), you have to do explicitly so.

Note, that when working with Ethernet or other network protocol you should not do so, order change is done by hardware (first bit sent over wire is least significant one on the platform).

Swift - Friday Pie
  • 12,777
  • 2
  • 19
  • 42