2

It seems that the following code finds the gradient descent correctly:

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y 
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m 
        # update
        theta = theta - alpha * gradient
    return theta

Now suppose we have the following sample data:

enter image description here

For the 1st row of sample data, we will have: x = [2104, 5, 1, 45], theta = [1,1,1,1], y = 460. However, we are nowhere specifying in the lines :

hypothesis = np.dot(x, theta)
loss = hypothesis - y

which row of the sample data to consider. Then how come this code is working fine ?

Saurabh Verma
  • 6,328
  • 12
  • 52
  • 84

3 Answers3

3

First: Congrats on taking the course on Machine Learning on Coursera! :)

hypothesis = np.dot(x,theta) will compute the hypothesis for all x(i) at the same time, saving each h_theta(x(i)) as a row of hypothesis. So there is no need to reference a single row.

Same is true for loss = hypothesis - y.

Lorenz Merdian
  • 722
  • 3
  • 14
  • Does this mean that x is an m*n dimensional matrix (m = no. of sample data and n = number of features) and y an m*1 matrix ? – Saurabh Verma Nov 10 '15 at 12:59
  • 1
    With some cautioness: Yes! Is it possible to debug your code and take a closer look at `x` and `y`? If so, try and see yourself. I presume it is, because if `x` and `z` are not m*n or m*1, then gradient descent, as defined in this function, would not make any sense. – Lorenz Merdian Nov 10 '15 at 13:56
2

This looks like a slide from Andrew Ng's excellent Machine Learning course!

The code works because you're using matrix types (from the numpy library?), and the basic operators (+, -, *, /) have been overloaded to perform matrix arithmetic - therefore you don't need to iterate over each row.

Scottie
  • 545
  • 4
  • 14
0

a hypothesis y is represented by y = w0 + w1*x1 + w2*x2 + w3*x3 + ...... wn*xn where w0 is the intercept. How is the intercept figured out in hypothesis formula abose in np.dot(x, theta)

I am assuming X = data representing features. and theta can be an array like [1,1,1.,, ] of rowSize(data)