6

I am trying to convert a binary array to decimal in following way:

uint8_t array[8] = {1,1,1,1,0,1,1,1} ;
int decimal = 0 ;    

for(int i = 0 ; i < 8 ; i++)
    decimal = (decimal << 1) + array[i] ;

Actually I have to convert 64 bit binary array to decimal and I have to do it for million times.

Can anybody help me, is there any faster way to do the above ? Or is the above one is nice ?

unkulunkulu
  • 11,576
  • 2
  • 31
  • 49
user1838343
  • 451
  • 2
  • 8
  • 16
  • 3
    It doesn't matter it the bits are in an array or as a string, you have to loop over the bits anyway. There might be more C++ specific ways to do it but in the end there will always be a loop over the bits. – Some programmer dude Feb 06 '13 at 08:36
  • [this question](http://stackoverflow.com/questions/1686004/fastest-way-to-convert-binary-to-decimal) may help you – Oren Feb 06 '13 at 08:41
  • 4
    Usually big optimization benefits can be gained not from small functions, but looking on a higher level. Why the digits are stored in this way? – maxim1000 Feb 06 '13 at 08:42
  • You could employ loop unrolling and/or pointer arithmetic instead of array access. That would speed things up but it would remain at the same algorithmic complexity. – Jack Aidley Feb 06 '13 at 08:51

5 Answers5

5

Your method is adequate, to call it nice I would just not mix bitwise operations and "mathematical" way of converting to decimal, i.e. use either

    decimal = decimal << 1 | array[i];

or

    decimal = decimal * 2 + array[i];
unkulunkulu
  • 11,576
  • 2
  • 31
  • 49
  • 1
    Note however that any decent compiler will treat the shift and the multiply identically, whereas using an 'or' instead of an add will restrict it's ability to optimise. Indeed, MSVC generates poorer code for the first example here, but the same code for the second case as in the original. – JasonD Feb 06 '13 at 09:13
2

It is important, before attempting any optimisation, to profile the code. Time it, look at the code being generated, and optimise only when you understand what is going on.

And as already pointed out, the best optimisation is to not do something, but to make a higher level change that removes the need.

However...

Most changes you might want to trivially make here, are likely to be things the compiler has already done (a shift is the same as a multiply to the compiler). Some may actually prevent the compiler from making an optimisation (changing an add to an or will restrict the compiler - there are more ways to add numbers, and only you know that in this case the result will be the same).

Pointer arithmetic may be better, but the compiler is not stupid - it ought to already be producing decent code for dereferencing the array, so you need to check that you have not in fact made matters worse by introducing an additional variable.

In this case the loop count is well defined and limited, so unrolling probably makes sense.

Further more it depends on how dependent you want the result to be on your target architecture. If you want portability, it is hard(er) to optimise.

For example, the following produces better code here:

unsigned int x0 = *(unsigned int *)array;
unsigned int x1 = *(unsigned int *)(array+4);

int decimal = ((x0 * 0x8040201) >> 20) + ((x1 * 0x8040201) >> 24);

I could probably also roll a 64-bit version that did 8 bits at a time instead of 4.

But it is very definitely not portable code. I might use that locally if I knew what I was running on and I just wanted to crunch numbers quickly. But I probably wouldn't put it in production code. Certainly not without documenting what it did, and without the accompanying unit test that checks that it actually works.

JasonD
  • 16,464
  • 2
  • 29
  • 44
0

You could use accumulate, with a doubling and adding binary operation:

int doubleSumAndAdd(const int& sum, const int& next) {
    return (sum * 2) + next;
}

int decimal = accumulate(array, array+ARRAY_SIZE,
                         doubleSumAndAdd);

This produces big-endian integers, whereas OP code produces little-endian.

Peter Wood
  • 23,859
  • 5
  • 60
  • 99
0

The binary 'compression' can be generalized as a problem of weighted sum -- and for that there are some interesting techniques.

  • X mod (255) means essentially summing of all independent 8-bit numbers.
  • X mod 254 means summing each digit with a doubling weight, since 1 mod 254 = 1, 256 mod 254 = 2, 256*256 mod 254 = 2*2 = 4, etc.

    If the encoding was big endian, then *(unsigned long long)array % 254 would produce a weighted sum (with truncated range of 0..253). Then removing the value with weight 2 and adding it manually would produce the correct result:

    uint64_t a = *(uint64_t *)array; return (a & ~256) % 254 + ((a>>9) & 2);

Other mechanism to get the weight is to premultiply each binary digit by 255 and masking the correct bit:

uint64_t a = (*(uint64_t *)array * 255) & 0x0102040810204080ULL; // little endian
uint64_t a = (*(uint64_t *)array * 255) & 0x8040201008040201ULL; // big endian  

In both cases one can then take the remainder of 255 (and correct now with weight 1):

return (a & 0x00ffffffffffffff) % 255 + (a>>56); // little endian, or  
return (a & ~1) % 255 + (a&1);  

For the sceptical mind: I actually did profile the modulus version to be (slightly) faster than iteration on x64.

To continue from the answer of JasonD, parallel bit selection can be iteratively utilized. But first expressing the equation in full form would help the compiler to remove the artificial dependency created by the iterative approach using accumulation:

ret =  ((a[0]<<7) | (a[1]<<6) | (a[2]<<5) | (a[3]<<4) |
        (a[4]<<3) | (a[5]<<2) | (a[6]<<1) | (a[7]<<0));

vs.

HI=*(uint32_t)array, LO=*(uint32_t)&array[4];
LO |= (HI<<4);    // The HI dword has a weight 16 relative to Lo bytes
LO |= (LO>>14);   // High word has 4x weight compared to low word
LO |= (LO>>9);    // high byte has 2x weight compared to lower byte
return LO & 255;

One more interesting technique would be to utilize crc32 as a compression function; then it just happens that the result would be LookUpTable[crc32(array) & 255]; as there is no collision with this given small subset of 256 distinct arrays. However to apply that, one has already chosen the road of even less portability and could as well end up using SSE intrinsics.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
0

Try this, I converted a binary digit of up to 1020 bits

#include <sstream>
#include <string>
#include <math.h>
#include <iostream>
using namespace std;

long binary_decimal(string num) /* Function to convert binary to dec */
{
    long dec = 0, n = 1, exp = 0;
    string bin = num;
       if(bin.length() > 1020){
          cout << "Binary Digit too large" << endl;
       }
       else {

            for(int i = bin.length() - 1; i > -1; i--)
            {
                 n = pow(2,exp++);
                 if(bin.at(i) == '1')
                 dec += n;
            }

       }
  return dec;
}

Theoretically this method will work for a binary digit of infinate length

  • You do realize that `long` is 32 (or 64) bits and you can't actually fit 1020 bits into that? – Blastfurnace Mar 15 '15 at 16:42
  • Yes..u musy have mis read it. It works like so : bin: 101 dec: 5 i read a string of up to 1020 bits and everytime i reach a 1, I add whatever 2* pow is. – Braxton Nunnally Mar 21 '15 at 15:55
  • Try and test your code with a string of more than 31 or 63 bits. You cannot possibly represent value of 1020 bits in a 32 or 64 bit signed integer. The result overflows and wraps around. [Look at your code running on ideone.com](http://ideone.com/was22Y). When I give it 32 or more bits the result is -1. It should be obvious that `1020` bits can't be represented with a `32` bit variable. – Blastfurnace Mar 21 '15 at 16:19
  • I'm designing a VM and It works perfically fine, I decoded a 1020 bit binary and stored the result in a double – Braxton Nunnally Mar 21 '15 at 18:40
  • This code doesn't use or return `double`. Even if you change it to use `double` you should realize that IEEE 754 double-precision floating point has a significand of 53 bits (about 16 decimal digits). You can't actually store 1020 significant bits in a `double`, it will be an approximation. The only way this works is with some Big Integer library. – Blastfurnace Mar 21 '15 at 18:48