16

I have a neural network written in Java which uses a sigmoid transfer function defined as follows:

private static double sigmoid(double x)
{
    return 1 / (1 + Math.exp(-x));
}

and this is called many times during training and computation using the network. Is there any way of speeding this up? It's not that it's slow, it's just that it is used a lot, so a small optimisation here would be a big overall gain.

Zaid
  • 36,680
  • 16
  • 86
  • 155
Simon
  • 78,655
  • 25
  • 88
  • 118
  • 3
    Are the values of x ever repeated or is it more likely that they will always be different each time the method is called? – DaveJohnston May 22 '10 at 11:14
  • Also, how accurate does the result need to be? – Matthew Flaschen May 22 '10 at 11:18
  • @Dave - depends on the desired accuracy, but they are all floating point numbers, so pretty much unique – Simon May 22 '10 at 11:19
  • @Matthew - good question, but I suspect I need at least 4 and probably 6dps. The trouble is that it is hard to establish necessary accuracy in real world problems where there is no "right" answer. – Simon May 22 '10 at 11:21
  • So that rules out the possibility of caching results, so unfortunately I can't see another way of improving this. – DaveJohnston May 22 '10 at 11:21
  • Is the range of the double values restricted, or are they all over? – Michael Borgwardt May 22 '10 at 11:22
  • Are you looking for this? http://sharpneat.sourceforge.net/integer_network.html – S.Lott May 22 '10 at 11:40
  • I once tried `memoize` -ing the sigmoid function in Perl, which basically does the same thing as generate a lookup table. The training time did not improve. This observation seconds DaveJohnston's suggestion. – Zaid May 22 '10 at 11:52
  • @Simon: A neural network! I always want to know how to write it. But documentation on internet is very long and I don't want to read it. Is it possible you gave me (and other interesting people) the code in an archive? That would be awesome! I would be very thankful. – Martijn Courteaux May 22 '10 at 12:05
  • 2
    Have you profiled the program to know that improving this will significantly improve the overall performance? – Refactor May 22 '10 at 17:09
  • 1
    @Refactor Speeding this calculation up is actually a common topic amongst NN eggheads, so I find how stereotypically StackOverflowish that response that is as extremely humorous :) – Philip Guin Aug 14 '12 at 07:22

4 Answers4

24

For neural networks, you don't need the exact value of the sigmoid function. So you can precalculate 100 values and reuse the value that is closest to your input, or even better (as a comment stated) do an interpolation from the neighbour values.

How you can do this is described in this article (link stolen from the answer of s-lott).

This is the sigmoid function:Sigmoid function graph

As you can see, only values of -10 < x < 10 are interesting at all. And, as another comment stated, the function is symmetric. You only have to store half of the values at all.


Edit: I'm sorry that I showed the wrong graph here. I've corrected it.

mpowell48
  • 43
  • 8
tangens
  • 39,095
  • 19
  • 120
  • 139
  • 1
    Maybe something more than 100 if you want a little bit more precision. A lookup table of 5000 (but probably even 1000) values will be absolutely sufficient IMHO. – nico May 22 '10 at 11:26
  • 2
    For more precision, it is probably better to do linear interpolation between the nearest two values. – Jouni K. Seppänen May 22 '10 at 11:34
  • 2
    The problem is symetrical, so you only need half the values. Calculating the other side is trivial. – Peter Lawrey May 22 '10 at 11:58
  • 1
    This is a plot of completely different function. erf(x) is hard to calculate, exp(x) is not. – Ha. May 22 '10 at 13:23
  • @Ha : Nice catch. This looks like the bipolar sigmoid function. The sigmoid function in the OP has horizontal asymptotes 0 and 1. – Zaid May 22 '10 at 13:36
  • Is it really worth replacing the existing function with an interpolation scheme? I would imagine it's slower. – Zaid May 22 '10 at 13:38
  • @Zaid: You will just have a lookup table, take the values corresponding to the first x that is greater than yours and the first that is lower, then do the mean. So it's just a sum and a division by 2, definitely faster. – nico May 22 '10 at 18:16
  • Take care to measure lookup vs calculation. The CPU executing one FP instruction is probably faster than interpolating two values (depending on CPU arch of course) – Merijn Vogel Nov 19 '21 at 14:26
6

If you have a lot of nodes where the value of x is outside the -10..+10 box, you can just omit to calculate those values at all, e.g., like so ..

if( x < -10 )
    y = 0;
else if( x > 10 )
    y = 1;
else
    y = 1 / (1 + Math.exp(-x));
return y;

Of course, this incurs the overhead of the conditional checks for EVERY calculation, so it's only worthwhile if you have lots of saturated nodes.

Another thing worth mentioning is, if you are using backpropagation, and you have to deal with the slope of the function, it's better to compute it in pieces rather than 'as written'.

I can't recall the slope at the moment, but here's what I'm talking about using a bipolar sigmoid as an example. Rather than compute this way

y = (1 - exp(-x)) / (1 + exp(-x));

which hits exp() twice, you can cache up the costly calculations in temporary variables, like so

temp = exp(-x);
y = (1 - temp) / (1 + temp);

There are lots of places to put this sort of thing to use in BP nets.

JustJeff
  • 12,640
  • 5
  • 49
  • 63
1

From a math point of view, I don't see any possibility to optimize it.

Femaref
  • 60,705
  • 7
  • 138
  • 176
1

It's a pretty smooth function, so a lookup and interpolation scheme is likely to be more than sufficient.

When I plot the function over a range of -10 <= x <= 10, I get five place accuracy at the extremes. Is that good enough for your application?

duffymo
  • 305,152
  • 44
  • 369
  • 561