0

I was learning Machine Learning from this course on Coursera taught by Andrew Ng. The instructor defines the hypothesis as a linear function of the "input" (x, in my case) like the following:

hθ(x) = θ0 + θ1(x)

In supervised learning, we have some training data and based on that we try to "deduce" a function which closely maps the inputs to the corresponding outputs. To deduce the function, we introduce the hypothesis as a linear function of input (x). My question is, why the function involving two θs is chosen? Why it can't be as simple as y(i) = a * x(i) where a is a co-efficient? Later we can go about finding a "good" value of a for a given example (i) using an algorithm? This question might look very stupid. I apologize but I'm not very good at machine learning I am just a beginner. Please help me understand this.

Thanks!

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

The a corresponds to θ1. Your proposed linear model is leaving out the intercept, which is θ0.

Consider an output function y equal to the constant 5, or perhaps equal to a constant plus some tiny fraction of x which never exceeds .01. Driving the error function to zero is going to be difficult if your model doesn't have a θ0 that can soak up the D.C. component.

J_H
  • 17,926
  • 4
  • 24
  • 44
  • What if we could come up with a co-efficient that doesn't leave the fraction tiny, but drives the evaluated expression to be very close to y (an accurate output)? –  Jan 10 '18 at 05:34
  • You're rejecting my hypothesis, that for my given function y never comes close to the x-axis. You are certainly free to consider different hypothesis families, including ones restricted to functions that go through the origin. That is the beauty of a a "model": it is a low-fidelity representation of the thing you are studying. Sometimes, going through the origin will be "good enough" for the task at hand. Though in general, you may want your linear model to robustly handle non-zero intercepts. – J_H Jan 10 '18 at 05:56