How to convert deep learning gradient descent equation into python - axis=0

Question

I think I'm following the same online tutorial as what is mentioned in the post: How to convert deep learning gradient descent equation into python

I understand we have to calculate the cost and db but my question is why do they put axis=0 in both equations? In other words, I do not understand the axis=0, what is it used for in this calculation. What would be the result if you do the calculation without axis=0

import numpy as np

cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)

db = np.sum((A-Y),axis=0)/m

In one of the answers to the linked question, I see that `axis=1` is used instead. `Y` has the shape `(1, N)`, so the summation should be performed over the 2nd axis. — MPA, Apr 25 '18 at 07:08
Hello @MPA you are right, it has to be axis=1 but still I don't understand the reason why to add axis=1 and not just: cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A)))))/m) Can you explain with an example the difference in calculation? — Gerrit, Apr 25 '18 at 10:36

score 0 · Answer 1 · answered Apr 25 '18 at 11:15

This is an example of a type of question that you could have tried out in the interpreter yourself to get an understanding of it in the same or less amount of time you probably took to compose this question.

Another way is to look at documentation. It is always a good habit to consult documentation here. Documentation on np.sum() can be found here

Some excerpts from the documentation, if you still feel lazy:

...
axis : None or int or tuple of ints, optional

Axis or axes along which a sum is performed. The default, axis=None, 
will sum all of the elements of the input array. If axis is negative it 
counts from the last to the first axis.
...

Some examples from the documentation:

>>> np.sum([0.5, 1.5])
2.0
>>> np.sum([[0, 1], [0, 5]])
6
>>> np.sum([[0, 1], [0, 5]], axis=0)
array([0, 6])
>>> np.sum([[0, 1], [0, 5]], axis=1)
array([1, 5])

Visualization

  -----> axis = 1
| [[0, 1
|  [0, 5]]
v
axis = 0

Hello @Vj, my question is more why do we have to use the axis=1 and not just calculate the cost without the axis=1 — Gerrit, Apr 25 '18 at 11:28
Ah! maybe I misunderstood your question. Anyways, it would be nice if you post an answer explaining it for someone in the community who might have the same question! — Vj-, Apr 25 '18 at 13:12

score 0 · Answer 2 · answered Apr 25 '18 at 15:50

Just for clarity: in many deep learning frameworks, all parameters are treated as tensors, and so scalars are simply treated as 0-th order tensors (size 1x1). If you do a np.sum(), you flatten the tensor and sum-up all components to produce a scalar (not a tensor). By explicitly using axis=1, you create a 0-th order tensor (in your case). I don't know if this is required by the code you linked in your question, but I can imagine that this plays a role in some deep learning frameworks.

Here is a quick example that illustrates my point:

import numpy as np

x = np.ones((1, 10))
no_ax = np.sum(x)
ax0 = np.sum(x, axis=0)
ax1 = np.sum(x, axis=1)

print(no_ax, ax0, ax1)

Result:

(10.0, array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), array([10.]))

How to convert deep learning gradient descent equation into python - axis=0

2 Answers2