Trying to plot a simple function - python

Question

I implemented a simple linear regression and I want to try it out by fitting a non linear model

specifically I am trying to fit a model for the function y = x^3 + 5 for example

this is my code

import numpy as np
import numpy.matlib
import matplotlib.pyplot as plt

def predict(X,W):
    return np.dot(X,W)

def gradient(X, Y, W, regTerm=0):
    return (-np.dot(X.T, Y) + np.dot(np.dot(X.T,X),W))/(m*k) + regTerm * W /(n*k)

def cost(X, Y, W, regTerm=0):
    m, k = Y.shape
    n, k = W.shape
    Yhat = predict(X, W)
    return np.trace(np.dot(Y-Yhat,(Y-Yhat).T))/(2*m*k) + regTerm * np.trace(np.dot(W,W.T)) / (2*n*k)

def Rsquared(X, Y, W):
    m, k = Y.shape
    SSres = cost(X, Y, W)
    Ybar = np.mean(Y,axis=0)
    Ybar = np.matlib.repmat(Ybar, m, 1)
    SStot = np.trace(np.dot(Y-Ybar,(Y-Ybar).T))

    return 1-SSres/SStot

m = 10
n = 200
k = 1

trX = np.random.rand(m, n)
trX[:, 0] = 1

for i in range(2, n):
    trX[:, i] = trX[:, 1] ** i

trY = trX[:, 1] ** 3 + 5
trY = np.reshape(trY, (m, k))

W = np.random.rand(n, k)

numIter = 10000
learningRate = 0.5

for i in range(0, numIter):
    W = W - learningRate * gradient(trX, trY, W)

domain = np.linspace(0,1,100000)
powerDomain = np.copy(domain)
m = powerDomain.shape[0]
powerDomain = np.reshape(powerDomain, (m, 1))
powerDomain = np.matlib.repmat(powerDomain, 1, n)

for i in range(1, n):
    powerDomain[:, i] = powerDomain[:, 0] ** i

print(Rsquared(trX, trY, W))
plt.plot(trX[:, 1],trY,'o', domain, predict(powerDomain, W),'r')
plt.show()

the R^2 I'm getting is very close to 1, meaning I found a very good fit to the training data, but it isn't shown on the plots. When I plot the data, it usually looks like this:

it looks as if I'm underfitting the data, but with such a complex hypothesis, with 200 features (meaning i allow polynomials up to x^200) and only 10 training examples, I should very clearly be overfitting data, so I expect the red line to pass through all the blue points and go wild between them.

This isn't what I'm getting which is confusing to me. What's wrong?

First thing first, fix this: `trX = np.random.rand(m) ** np.arange(n)` — Julien, Aug 04 '16 at 20:14

score 0 · Answer 1 · answered Aug 04 '16 at 20:24

0

You forgot to set powerDomain[:,0]=1, that's why your plot goes wrong at 0. And yes you are over fitting: look how quickly your plot fires up as soon as you get out of your training domain.

answered Aug 04 '16 at 20:24

Julien

13,986
5
29
53

Trying to plot a simple function - python

1 Answers1