0

I found that scipy.optimize.minimize works when I use .item() to retrieve a value from an numpy array in the objective function, but it fails when I retrieve by indexing [0,0]:

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

def hyp_log(X, theta):
    return sigmoid(X @ theta)

def cost_log(theta, X, Y, reg_const=0):
    hyp = hyp_log(X, theta)
    return  (Y.T @ -np.log(hyp) + (1-Y).T @ -np.log(1-hyp)).item() / len(X) + reg_const * (theta[1:].T @ theta[1:]).item() / (2 * len(X))

result = minimize(cost_log, theta, args=(X,Y,reg_const), method='TNC')

If I use [0,0] indexing instead of .item() in the cost_log function, the function itself works exactly the same as before, but minimize results in IndexError: too many indices for array. I want to understand why this happens and what I should be careful of in the objective function when using minimize.

Taxxi
  • 167
  • 1
  • 9
  • `arr[0,0]` requires that `arr` be 2d. `item()` works if it has 1 element. It's especially useful if the array is 0d, with shape `()`. You should know the shape of all arrays at all times - and test that when there's and question (or error). – hpaulj Jul 07 '20 at 07:36
  • @hpaulj All the corresponding arrays' shape is (1,1) so both [0,0] and item() works in the objective function. The latter just does not work in the minimize process. – Taxxi Jul 07 '20 at 08:37
  • What's `theta` ? initial value and as passed by `minimize`? – hpaulj Jul 07 '20 at 12:52
  • The `minimize` docs clearly states that the `x/theta` variable passed to the function is shape (n,). That's true even if the initial `theta` is (1,1). You would have seen this if you'd added a `print(theta.shape)` to `cost_log` as part of the debugging. – hpaulj Jul 07 '20 at 16:38
  • @hpaulj theta is (n,1). I indeed knew that theta should be (n, ) but I wanted to be consistent in the shapes and became curious why it was working correctly with .item(). I will just stick to (n, ) – Taxxi Jul 08 '20 at 00:17

1 Answers1

1

Since you have not provided X or Y, I won't look at:

(Y.T @ -np.log(hyp) + (1-Y).T @ -np.log(1-hyp))

but with:

(theta[1:].T @ theta[1:]).item()

if theta is (n,1):

In [15]: theta = np.arange(5)[:,None]                                                   
In [16]: theta.shape                                                                    
Out[16]: (5, 1)
In [17]: (theta[1:].T @ theta[1:])                                                      
Out[17]: array([[30]])
In [18]: (theta[1:].T @ theta[1:])[0,0]                                                 
Out[18]: 30
In [19]: (theta[1:].T @ theta[1:]).item()                                               
Out[19]: 30

But if you give that theta to minimize, it ravels it to a (n,) shape:

In [20]: theta=theta.ravel()                                                            
In [21]: (theta[1:].T @ theta[1:])                                                      
Out[21]: 30
In [22]: (theta[1:].T @ theta[1:]).shape                                                
Out[22]: ()
In [23]: (theta[1:].T @ theta[1:]).item()                                               
Out[23]: 30
In [24]: (theta[1:].T @ theta[1:])[0,0]                                                 
...
IndexError: invalid index to scalar variable.

I as wrote initially item can be used with a single item array, regardless of dimensions. [0,0] only works with a 2d (or higher) array.

hpaulj
  • 221,503
  • 14
  • 230
  • 353