Working with massive numbers and minute numbers simultaneously

Question

I have an mxn matrix A, with very small or very large elements. For example:

A = -1e4*randn(10,20);

I would like to create a new matrix C of the same size, as follows:

First, define a matrix B whose elements are the exponential of the elements of A:

B = exp(A)

Then, the matrix C is defined such that each column of C is proportional to the corresponding column of B and the sum of each column of C is equal to 1. In other words, if we take an element of B and divide it by the sum of all the elements of the same columns, then we obtain the corresponding element of C:

C = bsxfun(@rdivide,B,sum(B));

Mathematically:

enter image description here

Clearly, all the elements of C are between 0 and 1. However, when computing using the following code, the obtained matrix contains many NaNs:

A = -1e4*randn(10,20);

B = exp(A);

C = bsxfun(@rdivide,B,sum(B));

sum(sum(isnan(ee)))

I hope somebody can suggest a way to overcome this issue. Thank you very much in advance.

Update: Please note that the goal is the matrix C. I defined the matrix B just for explanation purpose, and we may not have to compute it. (We shouldn't actually. As @EJG89 pointed out that B contains only Inf and 0).

Update 2: Thanks to @EJG89 for the link to the Log Sum of Exponentials technique, which might be useful. I'm working on finding similar analytical tricks for my problem.

Update 3: As suggested by @Dan and @EJG89, we can subtract each column with a constant to obtain a new matrix within a reasonable range. Clearly, we have

enter image description here

for any constant C. We can choose C as the maximum value of each column:

enter image description here

(a_{max,j} is the maximum of the j-th column), then

enter image description here

I feel that this choice might give a very good approximation, but I don't know how good it is :|

The new code is:

A = bsxfun(@minus,A,max(A));
B = exp(A);
C = bsxfun(@rdivide,B,sum(B));

Dan · Answer 1 · 2014-07-17T14:14:01.843

3

You want to adjust A to some new A' such that e^A = Ce^A' where C is a constant (that is much less than 1). In other words you're looking for some k such that k.e^A is small enough not to breach ndoublemax or eps. But we want this k to be applied to A so we need to get it into the exponent. k = e^ln(k) and e^ln(k).e^A = e^{A + ln(k)} therefore if we add or subtract a number from A, e^A is proportionally effected. Or A' = A + ln(k)

unfortunately I think that your range is simply too large for any ln(k) to stop you from breaching the limits on Matlab doubles. If you add a big number you'll get all of B equal to inf and if you subtract a big number you'll get all of B equal to zero.

So you need to find some way to work with both massive number and minute numbers simultaneously.

edited Jul 17 '14 at 14:14

answered Jul 17 '14 at 13:15

Dan

45,079
17
88
157

Hi Dan. Thanks for the answer. I don't think that we have exp(A) = C*exp(A'). Instead, we have exp(a_i) = C*exp(a_i') where a_i and a_i' are respectively the i-th column of A and A', and of course the constant C is different from column to column (what I would like to have is that the sum of the elements of any columns of A' equals 1). – f10w Jul 17 '14 at 20:06
Dan, do you think that subtracting each column with its maximum element will give a good enough approximation? – f10w Jul 17 '14 at 21:37
1

@Nesbit If you want to normalize each column independently, but your data is still `-1e4*randn(10,20)`, then I'm pretty sure you're still going to have the same problem. You want the exponent of numbers in the order of 10000 and -10000 which is going to give you some numbers too big for Matlab to handle as doubles and others to small for Matlab to distinguish from 0, that problem remains weather you're normalizing globally for the matrix or in a column-wise fashion. – Dan Jul 18 '14 at 06:27

score 0 · Answer 2 · edited Aug 16 '23 at 22:25

0

Direct method is impossible.

A = -1e4*randn(10,20);
B = exp(A);

The problem is that B already yields Inf or 0.

Yielding division impossible returning Nan's.

ndoublemax = realmax
nsinglemax = realmax('single')

ndoublemin = realmin
nsinglemin = realmin('single')

yielding:

ndoublemax =

1.7977e+308

nsinglemax =

3.4028e+038

ndoublemin =

2.2251e-308

nsinglemin =

1.1755e-038

So your values for B clearly exceed those values.

An approximation can be made by rewriting the exponentials using this blog post

So first rewrite to: log(Cij) = log (exp(Aij)/sumk(exp(A(kj))))

log(Cij) = Aij - log( sumk(exp(A(kj)))

In which the last term can be approximated by: m = maxk(A(kj))

log(sumk(exp(A(kj))) = m + log(exp(A(kj)-m))

This results in smaller values able to be handled by MatLab, i.e. a normalization so to speak. Still no exact answer though because some values become zero removing their sometimes significant contribution.

edited Aug 16 '23 at 22:25

Henry Ecker

34,399
18
41
57

answered Jul 17 '14 at 12:49

EJG89

1,189
7
17

1

Thanks. I can see that, but I think there should be some way to compute C whose elements are between 0 and 1. – f10w Jul 17 '14 at 12:53
Look for an analytical solution instead of trying to compute in between products, MatLab is (and computers are) limited. – EJG89 Jul 17 '14 at 12:54
1

Added a analytical way to increase stability at the loss of the smaller numbers. – EJG89 Jul 17 '14 at 13:57
Thanks. I had been working on similar analytical tricks for my problem, too. – f10w Jul 17 '14 at 20:39

score 0 · Answer 3 · edited Jul 17 '14 at 13:25

0

Following the discussion of @EJG89, you can substitute infinite values with the maximum representable number, and then add eps to all the number in the denominator. In this way, you avoid the Inf.

A = -1e4*randn(10,20);

B = exp(A);
B(isinf(B)) = realmax;

C = bsxfun(@rdivide,B,sum(B)+eps);

sum(isnan(C(:)))

edited Jul 17 '14 at 13:25

Dan

45,079
17
88
157

answered Jul 17 '14 at 12:55

tashuhka

5,028
4
45
64

1

This does not answer the question - it will just replace the NaNs with 0s or 1s which isn't right. The answer must be some adjustment of `A` such that the resultant `B` is still proportionate to `exp(A)` – Dan Jul 17 '14 at 12:58
Well, it depends on the approach. In this case, we either truncate the range of values or we loose resolution. Without further information, I chose the easiest solution XD – tashuhka Jul 17 '14 at 13:07
I guess you might at well replace the `inf`s in `B` with `1` and everything else with `0` to get a rough approximation and just skip the `bsxfun` step – Dan Jul 17 '14 at 13:27
It happens you can approach the log of a sum by normalizing. – EJG89 Jul 17 '14 at 13:57

score 0 · Answer 4 · answered Jul 17 '14 at 14:06

0

One approach could be to resort to symbolic computation:

A = -1e4*randn(10,20);
S = sym(A,'d'); %// convert to symbolic
C = exp(S)./repmat(sum(exp(S)),size(S,1),1);

The result C is given in symbolic form. For example, the first column will be something like

>> C(:,1)
ans =
   2.9347694526167482187885232843876*10^(-790)
  1.548257079197982942994953632259*10^(-10497)
  8.4257350773369612716806306890481*10^(-6390)
  3.1937456016712092927990884814437*10^(-7937)
  4.0377045095817442881784227420794*10^(-4076)
                                           1.0
  3.3704875581881026436375125643672*10^(-6482)
 8.9221015470879963245101895439354*10^(-12246)
 9.4968643813486260650483647051531*10^(-11252)
 1.8777491443104052522832960625121*10^(-11084)

But note that even with symbolic computation the largest term is given as 1, when actually it is (slightly) smaller. Of course, if you then convert to double you only get 0 or 1 values:

>> Cd = eval(C);
>> Cd(:,1)
ans =
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0

answered Jul 17 '14 at 14:06

Luis Mendo

110,752
13
76
147

you can tell MATLAB to give you as much precision as you want from a symbolic expression, so that `1.0` will actually show up as `0.999...999999993577744642`: `vpa(C(6,1),1000)` – Amro Jul 17 '14 at 18:18
@Amro That's what I thought... but that still gives `1`, even if I ask for 1e6 digits (Matlab R2010b) – Luis Mendo Jul 17 '14 at 18:21
it has to do with how you told `sym` to approximate the numeric values from the beginning. Try `sym(randn(..))*1e4` instead. PS: try 5000 precision first before 1 million, that would take forever to finish :) – Amro Jul 17 '14 at 18:29
@Amro Still no luck. Must be somthing with my Matlab version. Yes, 1 million digits took a long time, but since I kept getting 1, I had to gradually increase just to make sure :-) – Luis Mendo Jul 17 '14 at 18:56
Hi @Luis and @Amro. This approach seems to be "promising" :D (at least it's better than directly computing the exponential). The rounding (to 1 and 0) is not good though. I tried for `A = 1e4*[1 -0.5; 1 0.5; -0.5 -0.5]` and obtained `C=[0.5 0; 0.5 1; 0 0]`, which is somehow a good approximation. However, for `A = 1e4*[1 -0.5; 0.999 0.5; -0.5 -0.5]` or `A = 1e4*[1 -0.5; 0.9999 0.5; -0.5 -0.5]` the results are not quite acceptable :( – f10w Jul 17 '14 at 20:26
@Amro and Luis: My bad. The methods gives very good results for the above three examples. – f10w Jul 17 '14 at 21:27
@Nesbit: just keep in mind that this is doing the computation in [variable-precision arithmetic](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic), and it should remain stored as such. The final results are beyond the range/precision of double floating-point numbers, converting them will just underflow/overflow. – Amro Jul 17 '14 at 21:32

Working with massive numbers and minute numbers simultaneously

4 Answers4