2

I would like to obtain a matrix from some vector x=(x_1,x_2, ..., x_I) where each row i in this matrix corresponds to x(i) := (x_1,...,x_{i-1},x_{i+1},...,x_I).

I know that

from sklearn.cross_validation import LeaveOneOut
I = 30
myrowiterator = LeaveOneOut(I)
for eachrow, _ in myrowiterator:
    print(eachrow)    # prints [1,2,...,29]
                      #        [0,2,...,29] and so on ...

provides a routine to obtain each row for the above matrix. But I would rather like to obtain the matrix directly in one step to operate directly on this matrix instead of looping through its rows. That would save me some computation time.

user3820991
  • 2,310
  • 5
  • 23
  • 32

2 Answers2

6

Since you have the numpy tag, the following works:

>>> N = 5
>>> idx = np.arange(N)
>>> idx = idx[1:] - (idx[:, None] >= idx[1:])
>>> idx
array([[1, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 1, 3, 4],
       [0, 1, 2, 4],
       [0, 1, 2, 3]])

And you can now use this to index any other array:

>>> a = np.array(['a', 'b', 'c', 'd', 'e'])
>>> a[idx]
array([['b', 'c', 'd', 'e'],
       ['a', 'c', 'd', 'e'],
       ['a', 'b', 'd', 'e'],
       ['a', 'b', 'c', 'e'],
       ['a', 'b', 'c', 'd']],
      dtype='|S1')

EDIT As @user3820991 suggests, this can be made a little less cryptic by writing it as:

>>> N = 5
>>> idx = np.arange(1, N) - np.tri(N, N-1, k=-1, dtype=bool)
>>> idx
array([[1, 2, 3, 4],
       [0, 2, 3, 4],
       [0, 1, 3, 4],
       [0, 1, 2, 4],
       [0, 1, 2, 3]])

The function np.tri is actually a highly optimized version of the magical comparison in the first version of this answer, as it uses the smallest possible int type for the size of the array, because comparisons in numpy are vectorized using SIMD, so the smaller the type, the faster the operation.

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Wow (+1). I think it'd be awesome if the answer included an intuitive explanation of the `idx[1:] - (idx[:, None] >= idx[1:])` magic. :-) – NPE Jan 21 '15 at 09:28
  • Ahh, I see. Very nice. `(idx[:, None] >= idx[1:])` is basically the same as `np.tril(np.ones((N,N-1)), k=-1)`. Thanks. – user3820991 Jan 21 '15 at 10:15
  • 1
    But a magician never reveals its secrets, @NPE! ;-) My last edit does provide an infinitely more readable version of the same thing. – Jaime Jan 21 '15 at 15:05
  • @Jaime Is there a special reason why you specify the type of the triangular matrix as boolean? why not simply `np.tri(N, N-1, k=-1)`? – user3820991 Jan 21 '15 at 15:24
  • Implementing the method I just found out why you need to specify `dtype` either as `bool` or `int`. Otherwise the idx will contain floats which cannot be used for indexing. – user3820991 Jan 21 '15 at 16:21
  • Yes, the float default of `tri` is kind of unfortunate, but it is there for historical reasons: `triu` and `tril` used to work by multiplying by the matrix returned by `tri`, and most matrix operations happen with floating point types. They now use `where` based on a boolean mask from `tri`, but you have to honor APIs for backwards compatibility, even when they don't make much sense anymore. With the current implementation, using `bool` is the fastest, although any integer type should work fine. – Jaime Jan 21 '15 at 19:52
1

The following will do it:

In [31]: np.array([row for row, _ in LeaveOneOut(I)])
Out[31]: 
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0,  1,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [ 0,  1,  2,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       ...
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]])
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • still kind of relying on a loop. I just thought there has to be some function that provides this directly. But I take that. Many thanks! – user3820991 Jan 20 '15 at 22:31
  • @user3820991: You can probably do something along these lines: http://stackoverflow.com/questions/17527693/transform-the-upper-lower-triangular-part-of-a-symmetric-matrix-2d-array-into?lq=1 but I don't have the energy to work it out right now. :-/ – NPE Jan 20 '15 at 22:38