2

I would like to perform a matrix balancing in python. I thought to use linalg.matrix_balance in order to make the input matrix a doubly stochastic matrix:

from scipy import linalg
import numpy as np
x = np.array([[1,2,7], [9,1,1], [1,2,10*np.pi]])

y, permscale = linalg.matrix_balance(x)    
np.abs(x).sum(axis=0) / np.abs(x).sum(axis=1)        
np.abs(y).sum(axis=0) / np.abs(y).sum(axis=1)


permscale  

array([[ 1.,  0.,  0.],
      [ 0.,  2.,  0.],
      [ 0.,  0.,  1.]])

It doesn't look that it actually balancing the rows and cols of the matrix. Any idea what should I do?

If I do: y, permscale = linalg.matrix_balance(x, scale=False)

The result isn't normalized either:

array([[ 1. , 2. , 7. ], [ 9. , 1. , 1. ], [ 1. , 2. , 31.41592654]])
0x90
  • 39,472
  • 36
  • 165
  • 245

1 Answers1

1

You are correct in your analysis of the output, but mistaken in your consideration of what matrix_balance does. From the docs,

The balanced matrix satisfies the following equality,

B = inv(T) * A * T (matrix products)

Which we can verify easily with your matrices,

>>>print(np.dot(np.dot(np.linalg.inv(permscale), x), permscale))

[[  1.           4.           7.        ]
 [  4.5          1.           0.5       ]
 [  1.           4.          31.41592654]]

Which is indeed y. This implies matrix_balance works as it claims to do. You claim,

It doesn't look that it actually balancing the rows and cols of the matrix. Any idea what should I do?

But this is not true: the L1 norms of this modified matrix are balanced, to within one order of magnitude of two, such that the precise orders of magnitude are reflected to the scaling matrix permscale.

Is your aim instead to make your matrix doubly-stochastic, rather than (just) balanced? If so, you could look at e.g. this project. Following the guide presented there I found the following doubly-stochastic matrix for your data,

>>>print(sk.fit(x))

[[ 0.1385636   0.55482644  0.30660996]
 [ 0.79518122  0.17688932  0.02792947]
 [ 0.06695665  0.26810303  0.66494032]]

>>>print(sk.fit(x).sum(axis=0), sk.fit(x).sum(axis=1))

[ 1.00070147  0.99981878  0.99947975] [ 1.  1.  1.]

Which ought to be 'close enough' for most use-cases.

Community
  • 1
  • 1
Nelewout
  • 6,281
  • 3
  • 29
  • 39
  • What about the other doubly stochastic balancing algorithms?Preferably something I can use `conda` to install. – 0x90 Jul 12 '18 at 19:35
  • @0x90 what about them? Balancing implies row and column sums must sum to unity, which is an optimisation problem (quasi)-Newton methods can be used for, as the paper you cite also shows (nice read btw). I only aimed to show you how you could use Sinkhorn-Knopp in Python, which I believe is the 'regular' approach to this problem. – Nelewout Jul 12 '18 at 19:41
  • Is there a matlab library you recommend on? – 0x90 Jul 13 '18 at 04:31
  • @0x90 I am afraid not - I do not use matlab all that often. An idea is to re-write the matlab code provided by the paper to Python, and use [matrix_balance](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.matrix_balance.html) for the initial guess? I'm just thinking out loud here, but would that help you along? – Nelewout Jul 13 '18 at 06:55