Multiply vectors with shape (2,) and (3, 1)

Question

I have this code:

import numpy as np


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate error
error = target - output

# TODO: Calculate error gradient for output layer
del_err_output = error * output * (1 - output)
print("del_err_output", del_err_output)

# TODO: Calculate error gradient for hidden layer
del_err_hidden = np.dot(del_err_output, weights_hidden_output) * hidden_layer_output * (1 - hidden_layer_output)
print("del_err_hidden", del_err_hidden)
print("del_err_hidden.shape", del_err_hidden.shape)
print("x", x)
print("x.shape", x.shape)
print("x[:,None]")
print(x[:,None])
print("x[:,None].shape", x[:,None].shape)
print("del_err_hidden * x[:, None]")
print(del_err_hidden * x[:, None])

that generates this output:

del_err_output 0.0287306695435
del_err_hidden [ 0.00070802 -0.00204471]
del_err_hidden.shape (2,)
x [ 0.5  0.1 -0.2]
x.shape (3,)
x[:,None]
[[ 0.5]
 [ 0.1]
 [-0.2]]
x[:,None].shape (3, 1)
del_err_hidden * x[:, None]
[[  3.54011093e-04  -1.02235701e-03]
 [  7.08022187e-05  -2.04471402e-04]
 [ -1.41604437e-04   4.08942805e-04]]

My problem is with this operation: del_err_hidden * x[:, None]

Which kind of operation is *?

And second, if del_err_hidden.shape is (2,) and x[:,None].shape is (3, 1), why I can multiply them?

Someone has told me that it is related to elementwise and broadcasting, but I don't understand those terms. Because to do a elementwise multiplication both matrices have to have the same size, and here they don't have it.

Numpy automatically does `del_err_hidden[None,:]`, so the multiplication is a `(1,2) * (3,1) => (3,2)` — hpaulj, Mar 28 '17 at 16:25
@hpaulj Isn't it the other way around? `(3, 1) * (1, 2) => (3, 2)` to be mathematically correct — kmario23, Mar 28 '17 at 16:27

score 2 · Answer 1 · answered Mar 28 '17 at 15:31

The * is just element-wise multiplication. broadcasting is the reason it works. In short, when you multiple your column of size (3, 1) (let's call it x) with a row of size (2, ) (let's call it y), numpy creates a new 3X2 array, where the first column is y[0]*x and the second column is y[1]*x.

The exact rules of when and how that happens are somewhat complicated. See the documentation for details

kmario23 · Accepted Answer · 2017-03-29T07:19:49.647

Okay, I'm quoting the broadcasting rules from the documentation:

Two dimensions are compatible when
1) they are equal, or
2) one of them is 1

You have two arrays of shape (2, ) and (3, 1).

arr1 (1D) shape :      2
arr2 (2D) shape :  3 x 1

#                      ^
#                      |    (c.f. rule-2)

In [24]: err              # shape (2,)
Out[24]: array([2, 4])

In [26]: x                # shape (3, 1)
Out[26]: 
array([[3],
       [4],
       [5]])

Since, one of the array dimensions is 1, rules are passed. These arrays are broadcastable and can be multiplied. Next part is stretching out the array where err becomes, (well only conceptually).

In [27]: err          # shape (3, 2)
Out[27]: 
array([[2, 4],
       [2, 4],
       [2, 4]])

Multiply vectors with shape (2,) and (3, 1)

2 Answers2