How to convert deep learning gradient descent equation into python

Question

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please

Please see the following link for the equations used Click here to see the equations used for the calculations

Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations

import numpy as np

def propagate(w, b, X, Y):
"""
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size
  (1, number of examples)

Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b

Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""

m = X.shape[1]


# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A =                                      # compute activation
cost =                                   # compute cost
### END CODE HERE ###


# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw = 
db = 
### END CODE HERE ###


assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())

grads = {"dw": dw,
         "db": db}

return grads, cost

Following are the data given to test the above function

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), 
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

Following is the expected output of the above

Expected Output:
dw  [[ 0.99993216] [ 1.99980262]]
db  0.499935230625
cost    6.000064773192205

For the above propagate function I have used the below replacements, but the output is not what is expected. Please kindly help on how to get the expected output

A = sigmoid(X)
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
dw = (np.dot(X,((A-Y).T)))/m
db = np.sum((A-Y),axis=0)/m

Following is the sigmoid function used to calculate the Activation:

def sigmoid(z):
  """
  Compute the sigmoid of z

  Arguments:
  z -- A scalar or numpy array of any size.

  Return:
  s -- sigmoid(z)
  """

  ### START CODE HERE ### (≈ 1 line of code)
  s = 1 / (1+np.exp(-z))
  ### END CODE HERE ###

return s

Hope someone could help me understand how to solve this as I couldn't continue with rest of the tutorials without understanding this. Many thanks

sigmoid: 1/(1 + np.exp(-x) Note: You have "return s" outside of your sigmoid function (Python uses tabbed lines under the function def to indicate they belong in the function). derivative of sigmoid: sigmoid(x) * (1 - sigmoid(x)) You can speed up sigmoid(x) by noting that output has already been sigmoided: dSigmoid = output * (1 - output) In any case, that's one of the activation functions you can use. Looks like you're on the right track with the rest of it. For cost (do you mean error?) you can just subtract output from the target sample. — JakeJ, Dec 11 '17 at 02:37

score 7 · Answer 1 · answered Nov 01 '18 at 14:21

7

You can calculate A,cost,dw,db as the following:

A = sigmoid(np.dot(w.T,X) + b)     
cost = -1 / m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A)) 

dw = 1/m * np.dot(X,(A-Y).T)
db = 1/m * np.sum(A-Y)

where sigmoid is :

def sigmoid(z):
    s = 1 / (1 + np.exp(-z))    
    return s

answered Nov 01 '18 at 14:21

Besher

550
3
8
29

what I don't understand is where the "m" is coming from? – PowerSurge Nov 20 '22 at 17:03

Eranga Atugoda · Answer 2 · 2017-08-24T09:25:01.947

After going through the code and notes a few times was finally able to figure out the error.

First it needs calculating Z and then pass it to the sigmoid function, instead of X

Formula for Z = w(T)X+b. So in python this is calculated as below

Z=np.dot(w.T,X)+b

Then calculate A by passing z to sigmoid function

A = sigmoid(Z)

Then dw can be calculated as below

dw=np.dot(X,(A-Y).T)/m

Calculation of the other variables; cost and derivative of b will be as follows

cost = -1*((np.sum((Y*np.log(A))+((1-Y)*(np.log(1-A))),axis=1))/m) 
db = np.sum((A-Y),axis=1)/m

score 0 · Answer 3 · edited Dec 11 '17 at 03:10

0

def sigmoid(x):
      #You have it right
      return 1/(1 + np.exp(-x))

def derivSigmoid(x):
      return sigmoid(x) * (1 - sigmoid(x))

error = targetSample - output

#Make sure to keep the sigmoided value around.  For instance, an output that has already been sigmoided can be used to get the sigmoid derivative faster (output = sigmoid(x)):
dOutput = output * (1 - output)

Looks like you're already working on the backprop. Just thought I'd help simplify some of the forward prop for you.

edited Dec 11 '17 at 03:10

Rob

26,989
16
82
98

answered Dec 11 '17 at 02:45

JakeJ

2,361
5
23
35

Thanks Rob, I seem to be having problems with marking code now. Did something change? – JakeJ Dec 11 '17 at 03:51

How to convert deep learning gradient descent equation into python

3 Answers3

Linked