I've taught myself machine learning with some online resources but I have a question about gradient descent that I couldn't figure out.
The formula for gradient descent is given by the following logistics regression:
Repeat {
θj = θj−α/m∑(hθ(x)−y)xj
}
Where θj
is the coefficient on variable j; α
is the learning rate; hθ(x)
is the hypothesis; y
is real value and xj
is the value of variable j. m
is the number of training sets. hθ(x)
, y
are for each training set (i.e. that's what the summation sign is for).
This is where I get confused.
It's not clear to me if summation is representing my entire training set or how many iterations I have done up to that point.
For example, imagine I have 10 training examples. If I perform gradient descent after each training example, my coefficients will be very different then if I performed gradient descent after all 10 training examples.
See below how the First Way is different then the Second Way:
First way
- Step 1: Since coefficients initialized to 0, hθ(x)=0
- Step 2: Perform gradient descent on the first training example. Summation term only includes 1 training example
- Step 3: Now use new coefficients for training examples 1 & 2... summation term includes first 2 training examples
- Step 4: Perform gradient descent again.
- Step 5: Now use new coefficients for training examples 1,2 &3... summation term includes first 3 training examples
- Continue until convergence or all training examples used.
Second way
- Step 1: Since coefficients initialized to 0, hθ(x)=0 for all 10 training examples
- Step 2: Perform 1 step of gradient descent using all 10 training examples. Coefficients will be different from the First Way because the summation term includes all 10 training examples
- Step 3: Use new coefficients on all 10 training examples again. summation term includes all 10 training examples
- Step 4: Perform gradient descent and continue using coefficients on all examples until convergence
I hope that explains my confusion. Does anyone know which way is correct?
Edit: Adding cost function and hypothesis function
cost function = −1/m∑[ylog(hθ(x))+(1−y)log(1−hθ(x))]
hθ(x) = 1/(1+ e^-z)
and z= θo + θ1X1+θ2X2 +θ3X3...θnXn