Storing a string as a binary string of 'unsigned char's to in matters of compression

Question

I need to store a string of 8 chars (they're all digits) in a compressed method,

As I understand it, each char uses 8 bits which are 1 byte and since I only use digits I can use 4 bits (2^4=16 combinations) so for each unsigned char I can store two digits instead of one. Thus I need 4 bytes to store 8 digits instead of 8 bytes.

Until here am I right or wrong?

Now how am I storing this data in a string of 4 unsigned chars? I'm not looking for an explicit answer just a kick start to understand the motivation.

Yes, you can save your digits like this. A `char` would contain `c = digit1 + 16*digit2` (or similar). To extract the digits, you'd use `digit1 = c % 16; digit2 = c / 16`. — Carsten, Aug 27 '13 at 09:51
That is called [packed BCD](http://en.wikipedia.org/wiki/Binary-coded_decimal#Basics), it used to be rather common but is less so now. You could also consider converting it to a proper integer. — harold, Aug 27 '13 at 09:52
@Carsten Ain't I supposed to convert the number to it's binary form using bitshifting from left to right? — Quaker, Aug 27 '13 at 09:52
Adding to @Carsten: Also digit1 = c & 0xf ; digit2 = c >> 4. — 0xF1, Aug 27 '13 at 09:53
@Quaker You could, but the math stays the same. Shifting 4 bits to the left is the same as multiplying by 16 (also see nishant's comment). — Carsten, Aug 27 '13 at 09:55

score 2 · Accepted Answer · answered Aug 27 '13 at 11:48

There are three obvious ways to store eight decimal digits in four eight-bit values.

One is to reduce each decimal digit to four bits and to store two four-bit values in eight bits.

Another is to combine each pair of decimal digits to make a number from 0 to 99 and store that number in eight bits.

Another is to combine all eight decimal digits to make a number from 0 to 99999999 and store that in 32 bits, treating the four eight-bit values as one 32-bit integer.

To decide between these, consider what operations you need to perform to encode the value (what arithmetic or bit operations are needed to combine two digits to make the encoded value) and what operations you need to perform to decode the value (given eight bits, how do you get the digits out of them?).

To evaluate this problem, you should know about the basic arithmetic operations and the bit operations such as bit-wise AND and OR, shifting bits, using “masks” with AND operations, and so on. It may also help to know that division and remainder are usually more time-consuming operations than other arithmetic and bit operations on modern computers.

0xF1 · Answer 2 · 2013-08-27T10:07:25.053

1

I prefer you use unsigned int as suggested by harold in comments. In unsigned char[4] you may require additional one char for terminating '\0' character.

Use shifting as you yourself suggested for proper conversion from uchar to uint.

edited Aug 27 '13 at 10:07

answered Aug 27 '13 at 10:02

0xF1

6,046
2
27
50

Assuming I had to convert 9 number, I should have converted number by number from right to left with leading 0's on the left. right? – Quaker Aug 27 '13 at 10:09
@Quaker: Yeah, you will convert `"9"` (a string) to `0009` in int. – 0xF1 Aug 27 '13 at 10:11
I actually have to do it using `uchar`s but I think I got the idea – Quaker Aug 27 '13 at 10:12
@Quaker : **uchar** will make one thing easy: you will not need to go "right to left" you can use '\0' terminator of uncompressed number string as your end point of conversion. – 0xF1 Aug 27 '13 at 10:15
I'm not sure I have to use the `'\0'` terminator since it's only saved as `uchar` for storing purposes. whenever one wants to use the data it will be converted back to a string of chars. – Quaker Aug 27 '13 at 10:16
@Quaker: How are you keeping track of number of digits in string? – 0xF1 Aug 27 '13 at 10:17
There will always be exactly 8 digits in the string – Quaker Aug 27 '13 at 10:46
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/36320/discussion-between-nishant-and-quaker) – 0xF1 Aug 27 '13 at 10:46
C has no requirement that an array of characters be terminated with a zero. This is only required for certain library routines. It is not relevant to this question. – Eric Postpischil Aug 27 '13 at 11:14
@EricPostpischil : I know that, but I am not aware of how Quaker is implementing his code, that is why I wrote: _you may require..._ – 0xF1 Aug 27 '13 at 11:16
@nishant: And how would `unsigned int` fix that? If you do not know how many items are an `unsigned char [4]`, you would not know how many are in an `unsigned int`. – Eric Postpischil Aug 27 '13 at 11:31
@EricPostpischil : I am sorry, but in my answer I meant one additional char to `unsigned char[4]` i.e. `unsigned char[5]` may be required. As Quaker has already said he needs 4 bytes to store, so `unsigned int` will be ok. – 0xF1 Aug 27 '13 at 11:34
@nishant: How would `unsigned int` replace `unsigned char [5]`? If you need the additional byte to indicate the number of items, how is the number of items indicated in an `unsigned int`? – Eric Postpischil Aug 27 '13 at 11:38
@EricPostpischil: Now I got what you want to say, Thanks. And Sorry because I was forgetting the counter that would be required if represented as `unsigned int`. – 0xF1 Aug 27 '13 at 11:44

Storing a string as a binary string of 'unsigned char's to in matters of compression

2 Answers2