5

Is log(a*b) always faster in Matlab than log(a) + log(b)?

I tested for several inputs and it seems log(a*b) is faster. Can you more experienced guys give me some opinions on this? Maybe warnings that this might not always be the case, or something else I should be careful with? So in the first case we have 1 log operation and 1 multiplication, in the second case we have two log operations and one summation.

Edit:

To add to my original post, the more general question is:

is log (a*b*...*z) always faster than log(a) + log(b) + ... + log(z)?

Thanks

Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104
user2381422
  • 5,645
  • 14
  • 42
  • 56
  • 4
    I would think `log` time >> `multiply time` > `add time`. So this observation makes sense. – lurker Aug 22 '13 at 16:51

3 Answers3

13

log(a*b) should always be faster, because computing the logarithm is expensive. In log(a*b) you only do it once, in log(a)+log(b) you do it twice.

Computing products and sums is trivial compared to logarithms, exponentials etc. In terms of processor cycles, both sums and products are generally less than 5 whereas exponentials and logarithms can go from 50 up to 200 cycles for some architectures.

Is log (a*b*...*z) always faster than log(a) + log(b) + ... + log(z)

Yes. Definitely. Avoid computing logarithms whenever possible.

Here's a small experiment:

a=rand(5000000,1);

% log(a(1)*a(2)...)
tic
for ii=1:100
    res=log(prod(a));
end
toc
% Elapsed time is 0.649393 seconds.  

% log(a(1))+log(a(2))+...
tic
for ii=1:100
    res=sum(log(a));
end
toc
% Elapsed time is 6.894769 seconds.

At some point the ratio in time will saturate. Where it saturates depends on your processor architecture, but the difference will be at least an order of magnitude.

Marc Claesen
  • 16,778
  • 6
  • 27
  • 62
  • The difference is even more drastic in the case of the complex log, e.g., `a` is a vector of complex random values. – horchler Aug 22 '13 at 18:42
  • 3
    careful here, multiplying a lot of small values in the range [0,1] will quickly become zero. This type of problem often occurs in real life when working with [probabilities](http://en.wikipedia.org/wiki/Log_probability)... So speed is not everything, and you should also be concerned about numerical stability when working with floating-point numbers. – Amro Aug 24 '13 at 06:36
  • 1
    To illustrate how bad it is, in your example above only after about 700 elements (out of 5 million) that the product became zero: `sum(cumprod(a) > 0)` gave me 729 – Amro Aug 24 '13 at 06:49
9

Beware, while calculating log of product is faster, it can be sometimes incorrect due to machine precision.

One of the problematic cases is using a lot of integer operands or large numbers as operands. In this case, the product a_1 * a_2 * ... a_n will result in an overflow, while computing the sum of logarithms will not.

Another problematic case is using small numbers such that their product becomes zero due to machine precision (As was mentioned by Amro).

Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104
  • 1
    OK, in that case we can create partial products, multiply before overflow and then take the log of that. Then keep doing the same thing, keep collecting terms into products, before overflow take the log. Agree? – user2381422 Aug 22 '13 at 17:09
  • 1
    Yes, it sounds feasible. – Andrey Rubshtein Aug 22 '13 at 17:11
  • 2
    +1 Good observation to a question where the answer was already stated. – Werner Aug 22 '13 at 17:13
  • @user2381422 Nein, aus Brasilien. Soll ich auf meine schlechte Deutsch hier sprechen? I'm from Brazil (I prefer it written as Brasil, on portuguese, it just seems more beauty) x) – Werner Aug 22 '13 at 17:36
2

Though it will usually be faster to do log(a*b) instead of log(a) + log(b) this does not hold if it is hard to evaluate a*b. In this case it can actually be the case that it is faster to use the second method.

Example:

a = 0;
b = Inf;
tic,for t = 1:1e6 log(a*b); end,toc
tic,for t = 1:1e6 log(a)+log(b); end,toc

Of course it will evaluate to NaN in both cases, but the second one is considerably faster than the first one.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • 0*Inf is handled at speed on modern processors; there’s nothing “hard to evaluate” about it. (In the past, some processors would encounter stalls doing arithmetic with Infinities and NaNs, but that’s no longer a real concern) – Stephen Canon Aug 23 '13 at 13:03
  • @StephenCanon well timings show that somehow it is still slow. Perhaps the slow part is evaluation of `log(NaN)` rather than the creation of the `NaN` by multiplying zero with infinity. Of course slow is relative, the 'normal' case is only about 2 times faster. – Dennis Jaheruddin Aug 23 '13 at 13:14
  • 1
    It is certainly possible that on your system log(nan) is slower than log(0) and log(inf), but that would be a quirk of one math library implementation, not a generally-applicable finding. – Stephen Canon Aug 23 '13 at 13:19
  • 1
    @StephenCanon True, but the question does have a matlab tag. – Dennis Jaheruddin Aug 23 '13 at 13:21