0

In the below example, it's able to find the correct slope (m) but whiffs completely on the intercept (b), which always comes out to near zero. Unless I give b a 1000x learning rate.

Why does this happen? Do different types of parameters need different learning rates?

Example result without 1000x learning rate for b:

m=3.1509653303 b=0.0360896063255

Example result with 1000x learning rate for b:

m=3.14160584013 b=6.27263311371

What's going on?

N = 1000

data = [x * 3.14159 + 3.14159 * 2 for x in xrange(N)]

m_param = b_param = 0

learning_rate = .000001
b_learning_rate = learning_rate * 1000

last_total_error = float('inf')

for i in xrange(10000):
  m_grad = 0
  b_grad = 0

  total_error = 0
  for x, y in enumerate(data):
    guess = m_param * x + b_param

    err = y - guess

    total_error += err ** 2

    m_grad += -(2./N) * x * err
    b_grad += -(2./N) * err

  if last_total_error == total_error and i > 20:
    break
  last_total_error = total_error

  m_param -= m_grad * learning_rate
  b_param -= b_grad * b_learning_rate

print 'params', m_param, b_param
ʞɔıu
  • 47,148
  • 35
  • 106
  • 149

1 Answers1

0

Probably you deal with "local minimum". That's why algos based on GD needs few runs on randomly generated started values of parameters.

exh3
  • 1
  • 1