0

I have a number of coefficients of type double, which should be stored on an FPGA as signed 16-bit numbers. For that I need to configure how many bits of the 16bit should be used for the fraction and how many for the integer part. So the idea is, that I search through my array with coefficients, take the integer part of it and calculate how many bits would be needed to represent it. I do this until the biggest coefficient.

This is my code up until now:

double value;
int value_intpart;
double value_array[] = {0.33333333333333333, 0.67676767676767676, -0.67676767676767777, 1.1111111111111111, 2.3654375346357653, -2.3658375346397653, 10.365437534635765};
for(int counter = 0; counter < 7; counter++)
{
  value = value_array[counter];
  value_intpart = floor(abs(value) + 1);

  std::cout << "Value: " << value << std::endl;
  if(abs(value) > max_coeff)
  {
    if(value >= -0.5 && value < 0.5)
      bits = 0;
    else
    {
      bits = ceil( log(value_intpart)/log(2) + 1 );
      std::cout << "Value_intpart: " << value_intpart << " Log2: " << log(value_intpart)/log(2) << std::endl;
    }
    std::cout << "Bits: " << bits << std::endl;
    max_coeff = value;
  }
}

It gives me the following output:

Value: 0.333333
Bits: 0
Value: 0.676768
Value_intpart: 1 Log2: 0
Bits: 1
Value: -0.676768
Value_intpart: 1 Log2: 0
Bits: 1
Value: 1.11111
Value_intpart: 2 Log2: 1
Bits: 2
Value: 2.36544
Value_intpart: 3 Log2: 1.58496
Bits: 3
Value: -2.36584
Value_intpart: 3 Log2: 1.58496
Bits: 3
Value: 10.3654
Value_intpart: 11 Log2: 3.45943
Bits: 5

But I am not really sure about the effectivness of that code and if it is entirely correct. For example I found no other way as to check manually for the region from -0.5 until 0.5 (without 0.5). Could you give me some feedback to that code?

JimmyB
  • 12,101
  • 2
  • 28
  • 44
Daiz
  • 357
  • 1
  • 3
  • 20
  • What's that `if(value >= -0.5 && value < 0.5)` for? How is 0.75 represented in binary for example? – JimmyB Feb 18 '15 at 11:10

2 Answers2

0
#include <algorithm>        // std::max
#include <iostream>
#include <iomanip>
#include <math.h>           // frexp
#include <utility>          // std::begin, std::end
using namespace std;

auto exponent_of( const double x )
    -> int
{
    int result;
    frexp( x, &result );
    return result;
}

auto main() -> int
{
    const double values[] =
    {
        +0.33333333333333333,
        +0.67676767676767676,
        -0.67676767676767777,
        +1.1111111111111111,
        +2.3654375346357653,
        -2.3658375346397653,
        +10.365437534635765
    };

    int max_exponent = 0;
    cout << fixed << setprecision( 8 );
    for( const double x : values )
    {
        const int exponent = exponent_of( x );
        cout << setw( 12 ) << x << setw( 4 ) << exponent << endl;
        max_exponent = max( max_exponent, exponent );
    }
    cout << "Max binary exponent = " << max_exponent << endl;
}

Output:

  0.33333333  -1
  0.67676768   0
 -0.67676768   0
  1.11111111   1
  2.36543753   2
 -2.36583753   2
 10.36543753   4
Max binary exponent = 4

This means you need 4 binary digits for the whole number part of the number with largest absolute value.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
0

int_bits = 0;
val = 0;
for(i=0;i<array_length;++i){
  val = ceil(log(abs(array[i]))/log(2)) + 1;
  if(val > int_bits){
    int_bits = val;
  }
}

The +1 is to account for the sign bit, which is lost through the abs function

Mitan
  • 1
  • 1