10

I'd like to approximate the ex function.

Is it possible to do so using multiple splines type based approach? i.e between x1 and x2, then

y1 = a1x + b1, between x2 and x3,

then

y2 = a2x + b2

etc

This is for dedicated fpga hardware and not a general purpose CPU. As such I need to create the function myself. Accuracy is much less of a concern. Furthermore I can't really afford more than one multiplication circuit and/or multiple shifts/adders. Also I want something much smaller than a CORDIC function, in fact size is critical.

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
trican
  • 1,157
  • 4
  • 15
  • 24
  • 2
    What range of x values are you planning to approximate this over? – Cassidy Laidlaw Aug 08 '11 at 15:24
  • 6
    Default answer: [power series](http://en.wikipedia.org/wiki/Exponential_function#Formal_definition) – user786653 Aug 08 '11 at 15:25
  • 2
    You have `exp()` function in C++ standard. Why do you avoid using it? Usually it has good speed. – George Gaál Aug 08 '11 at 15:33
  • Recursive approximations are not suitable for my application. Potential maximum range is 0-4095, but it can be scaled to a smaller value. My hunch is that I need about 4 to 6 bits of precision – trican Aug 08 '11 at 15:35
  • 2
    My application isn't actually C or C++, its dedicated hardware, so I'm rolling the function myself. Power function is nice, but I'd prefer something with fewer operations. – trican Aug 08 '11 at 15:37
  • @user786653: Definitely not a power series. That's a theoretical math definition, not a numerical math definition. The same page has more practical formulas, e.g. [Continued fractions](http://en.wikipedia.org/wiki/Exponential_function#Continued_fractions_for_ex) – MSalters Aug 08 '11 at 15:41
  • It's more or less one. In some cases, a lot more or less :) Sorry, a old math joke. – Edwin Buck Aug 08 '11 at 15:41
  • Just to clarify based on the `0-4095` statement: that's integer? Because the algorithm for integer x is trivial; just store e^1..e^2048 and multiply according to the bits in x. 11 multiplications worst case. – MSalters Aug 08 '11 at 15:43
  • Thanks MSalter - yes the range is integer, but the solution contains about 10 too many multiplcations – trican Aug 08 '11 at 15:52
  • See http://math.stackexchange.com/questions/55830/how-to-calculate-ex-with-a-standard-calculator – lhf Aug 08 '11 at 16:08
  • @trican: re "but the solution contains about 10 too many multiplications": First off, that sounds very much like premature optimization. Secondly, your proposed use of splines will be even more expensive. Thirdly, 0 to 4095? `exp(4095)` is a very, very big number. Finally, see http://www.netlib.org/fdlibm/e_exp.c . – David Hammen Aug 08 '11 at 20:04
  • thanks for the response David, I wish this was premature optimization - but there is NO implementation of exponential functions in hardware description languages such as Verilog or VHDL for FPGAs/ASICs. Furthermore small size & lower power are absolutely critical in my case and I'll willing to trade accuracy for that. – trican Aug 08 '11 at 20:37
  • We really need the range and precision of the input **and** the precision of the output. Q12.0 for the input gives Q400+ for the output. These are extremely wide signals to deal with on an FPGA. –  Aug 10 '11 at 22:52
  • @Adam12, in my scenarios - X will negative, meaning the output is bound between 0 and 1 - so I can comfortably deal with this. – trican Aug 11 '11 at 14:47

10 Answers10

27

How about a strategy like this that uses the formula

ex = 2 x/ln(2)

  1. Precalculate 1/ln(2)
  2. Multiply this constant by your argument (1 multiplication)
  3. Use binary shifts to raise 2 to the integer portion of the power (assumes exp+mantissa format)
  4. Adjust based on the fractional power-of-2 remainder (likely a second multiplication)

I realize this is not a complete solution, but it does only require a single multiplication and reduces the remaining problem to approximating a fractional power of 2, which should be easier to implement in hardware.

Also, if your application is specialized enough, you could try to re-derive all of the numerical code that will run on your hardware to be in a base-e number system and implement your floating point hardware to work in base e as well. Then no conversion is needed at all.

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
Lucas
  • 8,035
  • 2
  • 32
  • 45
  • 1
    thanks Lucas - this is perfect for my needs, even better than I could have hoped for. Many thanks! – trican Aug 08 '11 at 20:38
  • Glad to hear. Sounds like you've got some interesting design tradeoffs. – Lucas Aug 08 '11 at 21:17
  • 2
    @trican There's a good paper on implementing this identity and range reduction to achieve reasonable accuracy for single precision floating point using lookup tables and fixed-point arithmetic: http://www.loria.fr/~detreyje/publications/DetDin_fpt_2005.pdf – Chiggs Feb 02 '14 at 22:33
  • Alternate link to the PDF: http://perso.citi-lab.fr/fdedinec/recherche/publis/2005-FPT.pdf – Lucas Sep 28 '17 at 14:11
  • exp() with just two multiplications and a bitshift, holy shift! – 16807 Sep 26 '22 at 05:54
14

If x is an integer, you can just multiply e by itself over and over again.

If x is not an integer, you can calculate the efloor(x) using the above method and then multiply by a small correction term. This correction term can be easily calculated using a number of approximation methods. One such way is this:

ef1 + f(1 + f/2(1 + f/3(1 + f/4))), where f is the fractional part of x

This comes from the (optimized) power series expansion of ex, which is very accurate for small values of x. If you need more accuracy, just tack on more terms to the series.

This math.stackexchange question contains some additional clever answers.

EDIT: Note that there is a faster way of calculating en called exponentiation by squaring.

Community
  • 1
  • 1
tskuzzy
  • 35,812
  • 14
  • 73
  • 140
  • 2
    The best solution to the integer solution isn't this O(n) solution. A divide and conquer algorithm (pre)calculates e^1, e^2, e^4, e^8 etc. You then take the factors which correspond to the bits in `x`. This is O(logN). I.e. for x=255, this takes only 8 multiplications instead of 254. – MSalters Aug 08 '11 at 15:36
  • Thanks - but i'm looking to minimise multiplication operations, I only want one multiplication operation – trican Aug 08 '11 at 15:51
  • But *why*? Are you *actually* seeing performance problems, or is this premature optimization? – Jonathan Grynspan Aug 08 '11 at 15:57
  • @Jonathan - its not for a cpu, its for dedicated hardware. I've updated my question above to clarify this. Sorry for the confusion – trican Aug 08 '11 at 16:05
  • @Jonathan Because having a O(n) exponential function will obviously lead to bad performance. Premature optimization is not bad on a systems level. – alternative Jun 03 '14 at 15:47
  • This was just what I needed to do an integer math version of e^x – zawy Nov 09 '18 at 11:33
4

First off, what is motivating this approximation? In other words, what exactly is wrong with the straightforward exp(x)?

That said, a typical implementation of exp(x) is to

  • Find an integer k and floating point number r such that x=k*log(2) + r and r is between -0.5*log(2) and 0.5*log(2).
  • With this reduction, exp(x) is 2k*exp(r).
  • Calculating 2k is a snap.
  • The standard implementations of exp(x) use a Remes-type algorithm to come up with a minimax polynomial that approximates exp(r).
  • You could do the same, but use a reduced order polynomial.

Here's the kicker: No matter what you do the odds are very high that your function will be much, much slower than just calling exp(). Most of the functionality of exp() is implemented in your computer's math coprocessor. Re-implementing that functionality in software, even with reduced precision, is going to be an order of magnitude slower than just using exp().

David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • Remez* and most actually use a Pade approximation centered on the bound so that the error over this range, is as small as possible. The error for a given input `x` is equal to the bounded error multiplied by `2^k` which usually destroys most of these approximations when the input is large... I 'believe' the actual implementation, employs both the pade approximation and an iterative improvement root finding method of inverse function subtracted from the input. – nimig18 Apr 05 '17 at 18:51
  • why should `r` resides between `-0.5log(2)` and `0.5log(2)` not `(0, 1)`? – Elinx Feb 23 '19 at 01:25
3

For hardware, I have an awesome solution for you IF you need it to be bit-level accurate. (Else just do an approximation like above). The identity is exp(x) = cosh(x) + sinh(x), the hyperbolic sine and cosine. The catch is that the hyperbolic sine and cosine can be computed using the CORIC technique, and best of all, they are one of the FAST CORDIC functions, meaning they look almost like multiply instead of almost like divide!

Which means for about the area of an array multiplier, you can compute exponent to arbitrary precision in just 2 cycles!

Look up the CORDIC method - it's AMAZING for hardware implementation.

One other hardware approach is using a small table in conjunction with a formula others have mentioned: exp(x + y) = exp(x) * exp(y). You can break the number up into small bit fields - say 4 or 8 bits at a time - and just look up the exponent for that bitfield. Probably only effective for narrow computations, but it's another approach.

user2465201
  • 471
  • 5
  • 10
2

http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/ using Schraudolph's method (http://nic.schraudolph.org/pubs/Schraudolph99.pdf) in Java:

public static double exp(double val) {
    final long tmp = (long) (1512775 * val) + (1072693248 - 60801);
    return Double.longBitsToDouble(tmp << 32);
}

and https://math.stackexchange.com/a/56064 (look for Pade approximant).

Community
  • 1
  • 1
jdbertron
  • 733
  • 7
  • 11
  • Thanks @jdberton for adding this and the links. The approach seems quite interesting, however are you sure the code snippet above is correct? I tried it for some values and the result doesn't seem to be even close? – trican Dec 12 '12 at 09:54
  • I think it would be inaccurate for large values. You can probably find a better Pade approximant with some work to get a better range. It works for me because I don't need anything exact. – jdbertron Mar 20 '13 at 03:25
  • Schraudolphs method is perfect. I don't think it can get any faster if the accuracy is acceptable. In his paper he determines the mean relative error to be about 4%. Source: http://nic.schraudolph.org/pubs/Schraudolph99.pdf – Gigo May 04 '16 at 23:20
  • Here is a more modern implementation of Schraudolph's method, using single point float instead of double (which is a waste, because only the upper 32 bits of the double are being written ). http://www.machinedlearnings.com/2011/06/fast-approximate-logarithm-exponential.html – Mark Lakata Jul 27 '16 at 23:45
2

This is not the smooth spline interpolation you requested but its computationally efficient:

float expf_fast(float x) {
   union { float f; int i; } y;
   y.i = (int)(x * 0xB5645F + 0x3F7893F5);
   return (y.f);
}

Plot Output image

nimig18
  • 797
  • 7
  • 10
1

This is not appropriate for custom FPGA, but worth mentioning.

http://www.machinedlearnings.com/2011/06/fast-approximate-logarithm-exponential.html

And the source code:

https://code.google.com/archive/p/fastapprox/downloads

The "faster" implementation only involves 3 steps (multiply, add, convert float to int) and a final cast back to float. In my experience, it is 2% accurate, which may be enough if you don't care about the actual value but are using the value in a log-likelihood maximization iteration.

Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
1

Of course it is "possible". There are several issues.

  1. What is your requirement for the accuracy?

  2. Are you willing to use higher order splines?

  3. How much memory are you willing to spend on this? Linear function over small enough intervals will approximate the exponential function to any degree of accuracy needed, but it may require a VERY small interval.

Edit:

Given the additional information provided, I ran a quick test. Range reduction can always be used on the exponential function. Thus, if I wish to compute exp(x) for ANY x, then I can rewrite the problem in the form...

y = exp(xi + xf) = exp(xi)*exp(xf)

where xi is the integer part of x, and xf is the fractional part. The integer part is simple. Compute xi in binary form, then repeated squarings and multiplications allow you to compute exp(xi) in relatively few operations. (Other tricks, using powers of 2 and other intervals can give you yet more speed for the speed hungry.)

All that remains is now to compute exp(xf). Can we use a spline with linear segments to compute exp(xf), over the interval [0,1] with only 4 linear segments, to an accuracy of 0.005?

This last question is resolved by a function that I wrote a few years ago, that will approximate a function with a spline of a given order, to within a fixed tolerance on the maximum error. This code required 8 segments over the interval [0,1] to achieve the required tolerance with a piecewise linear spline function. If I chose to reduce the interval further to [0,0.5], I could now achieve the prescribed tolerance.

So the answer is simple. If you are willing to do the range reductions to reduce x to the interval [0.0.5], then do the appropriate computations, then yes you can achieve the requested accuracy with a linear spline in 4 segments.

In the end, you will always be better off using a hard coded exponential function though. All of the operations mentioned above will surely be slower than what your compiler will provide, IF exp(x) is available.

  • many thanks for the detailed response. On further reflection I can tolerate much higher margins of error, probably as much as 0.05, and maybe even 0.1. I've used splines with range reduction before for other functions, but in this case I think Lucas' answer above is even more suitable for the lower accuracy requirement. Also the key point is there is NO direct implementation in the hardware "compiler" for an exponential function. i.e. I'm not working on a CPU – trican Aug 08 '11 at 20:48
0

Wolfram presents a few good ways of approximating it in terms of series etc:

Wikipedias page on Taylor Series also shows an example of an expansion of ex around 0:

Community
  • 1
  • 1
aioobe
  • 413,195
  • 112
  • 811
  • 826
0

Or you could just do pow(M_E, x) in C. (Some platforms don't have M_E defined; on those, you may have to manually specify the value of e, which is approximately 2.71828182845904523536028747135266249775724709369995.)

(As David points out in the comments, exp(x) would be more efficient than pow(M_E, x). Again, brain not turned on yet.)

Do you have a use case where the calculation of ex is a proven bottleneck? If not, you should be coding for readability first; only try these sorts of optimizations if the obvious approach is too slow.

Jonathan Grynspan
  • 43,286
  • 8
  • 74
  • 104
  • 6
    `pow(M_E, x)`? Seriously? `pow(a,b)` is typically implemented as `exp(b*log(a))`. Using `pow` is a speed bump, not a speedup. – David Hammen Aug 08 '11 at 15:31
  • That was kind of my point--write the code properly first, *then* take a look at the performance of it. Nowhere in the original question is it stated that this is called a million times a second or anything like that, so it's not immediately obvious that performance will be an issue. – Jonathan Grynspan Aug 08 '11 at 15:34
  • Regardless of performance, `exp(x)` is a simpler (and more portable!) solution than `pow(M_E, x)`. Even if `pow()` were faster, resorting to it rather than `exp()` would be premature optimization. – Keith Thompson Aug 08 '11 at 15:45
  • Very true, and I have updated my answer to reflect David's correction. Can you tell I haven't had enough coffee yet? :) – Jonathan Grynspan Aug 08 '11 at 15:47