-5

Here is the problem statement -

a: 5 bit representation, where 2 MSB bits are integer part and 3 LSB bits are fractional b: 5 bit representation, where 2 MSB bits are integer part and 3 LSB bits are fractional c: 11 bit representation, where MSB bit is integer part and 10 LSB bits are fractional

I am trying to write C code to perform:

d = a * b + c

How to do this optimally, what data-structures to use etc

Thanks, adding some more details - a & b are uint8_t (unsigned char), c is uint16_t (unsigned short int).

Taking the least 5 bits of the uint8_t to represent a & b Taking the least 11 bits of the unit_16_t to represent c

Using appropriate bit masks to extract the integer and fractional parts such as

a.int = (a >> 3) & 0x3
a.frac = a & 0x7
b.int = (b >> 3) & 0x3
b.frac = b & 0x7

Now I'm thinking I am over complicating the solution by separating the integer and fractional parts.

Suppose I want to multiple 2.31 with 1.05.

We can multiply 231 with 105 and divide later by 10000.

So you don't need to separate the integer and fractional parts of the original real number.

Along these lines, what is a good solution? // a - 5 bits, least 3 bits are fractional part, upper 2 bits are integer part

// b - 5 bits, least 3 bits are fractional part, upper 2 bits are integer part

// c - 11 bits, least 10 bits are fractional part, MSB is integer part

#define uint8_t (unsigned char)

#define uint16_t (unsigned short int)

uint16_t   compute(uint8_t a, uint8_t b, uint16_t c)
{
     uint16_t multval = a * b;  // the least 6 bits represent the fractional part, the upper 4 bits represent integer part
     uint8_t ab_int = multval >> 6; // integer part of a*b
     uint8_t ab_frac = multval & 0x3F; // fractional part of a*b
     uint16_t ab_adjusted = (ab_int << 10) | ab_frac;
     uint16 sum = c + ab_adjusted;
     return sum;
}
Ukkadam
  • 11
  • 2
  • If you show us your attempt/s, details might be clearer, eg. where the representations are aligned within the 8/16/32/64 type used as input to your multiply function. As it is, your question will probably be closed as an unclear homework dump:( – Martin James Mar 14 '20 at 23:49
  • Thanks, adding some more details - a & b are uint8_t (unsigned char), and c is uint16_t (unsigned short int). Taking the least 5 bits of the uint8_t to represent a & b Taking the least 11 bits of the unit_16_t to represent c Using appropriate bit masks to extract the integer and fractional parts such as a.int = (a >> 3) & 0x3 a.frac = a & 0x7 b.int = (b >> 3) & 0x3 b.frac = b & 0x7 – Ukkadam Mar 15 '20 at 00:04
  • @Ukkadam: Put information in the question, not in comments. – Eric Postpischil Mar 15 '20 at 00:10

1 Answers1

0

Multiplying fixed-point values is exactly the same as multiplying integers. The only complication is that you must keep track of the number of integer and fraction bits in the result.

If you have two unsigned 8-bit values where the bottom F_1 and F_2 bits of the two values are fraction bits, then the 16-bit product will have F_1+F_2 fraction bits. Likewise, the product will have I_1+I_2 integer bits. If you have right-justified the multiplicands in the 8-bit containers then the product will also be right justified in its 16-bit container.

Addition is trickier. You must align the radix point of the addends before doing the integer addition, which means that they need to have the same number of fraction bits (assuming again that they are right-justified). You can accomplish this by shifting the operand with more fraction bits to the right (which sacrifices accuracy and resolution) or shift the value with fewer fraction bits to the right (which means you need more total bits for the sum). The choice is yours.

The sum of two fixed-point values has the same number of fraction bits as the two addends. If you want to make sure that overflow can't corrupt the sum, then the sum will have a number of integer bits that is equal to the maximum number of integer bits in either addend, plus one. If the sum won't fit in the container you have available for it, then you need to shift the addends right, discarding fraction bits and moving the radix point, until the sum will fit.

Elliot Alderson
  • 638
  • 4
  • 8