2

Actually I need to calculate the parameters theta0 and theta1 using linear regression. My data frame (data.1) consists of two columns, first one is a date-time and the second one is a result which is dependent on this date.
Like this:

        data.1[[1]]  data.1[[2]]
2004-07-08 14:30:00        12.41

Now, I have this code for which iterates over a number of times to calculate the parameter theta0, theta1

x=as.vector(data.1[[1]])
y=as.vector(data.1[[2]])

plot(x,y)

theta0=10
theta1=10
alpha=0.0001
initialJ=100000
learningIterations=200000

J=function(x,y,theta0,theta1){
m=length(x)
sum=0
for(i in 1:m){
  sum=sum+((theta0+theta1*x[i]-y[i])^2)
}
sum=sum/(2*m)
return(sum)
}

updateTheta=function(x,y,theta0,theta1){
sum0=0
sum1=0
m=length(x)
for(i in 1:m){
   sum0=sum0+(theta0+theta1*x[i]-y[i])
   sum1=sum1+((theta0+theta1*x[i]-y[i])*x[i])
 }
sum0=sum0/m
sum1=sum1/m
theta0=theta0-(alpha*sum0)
theta1=theta1-(alpha*sum1)
 return(c(theta0,theta1))
}    

for(i in 1:learningIterations){
thetas=updateTheta(x,y,theta0,theta1)
tempSoln=0
tempSoln=J(x,y,theta0,theta1)
if(tempSoln<initialJ){
  initialJ=tempSoln
}
if(tempSoln>initialJ){
   break
 }
theta0=thetas[1]
theta1=thetas[2]
#print(thetas)
#print(initialJ)
plot(x,y)
lines(x,(theta0+theta1*x), col="red")
  }
  lines(x,(theta0+theta1*x), col="green")

Now I want to calculate theta0 and theta1 using the following scenarios:

  1. y=data.1[[2]] and x=dates which are similar irrespective of the year
  2. y=data.1[[2]] and x=months which are similar irrespective of the year

Please suggest..

Mohit Goyal
  • 110
  • 1
  • 3
  • 11

1 Answers1

3

As @Nicola said, you need to use the lm function for linear regression in R.
If you'd like to learn more about linear regression check out this or follow this tutorial

First you would have to determine your formula. You want to calculate Theta0 and Theta1 using data.1[[2]] and dates/months.

Your first formula would be something along the lines of:

formula <- Theta0 ~ data.1[[2]] + dates

Then you would create the linear model

variablename <- lm(formula, dataset)

After this you can use the output for various calculations.
For example you can calculate anova, or just print the summary:

anova(variablename)
summary(variablename)

Sidenote:.
I noticed your assigning variables by using =. This is not recommended parenthesis. For more information check out Google's R Style Guide
In R it would be preferred to use <- to assign variables.
Taking the first bit of your code, it would become:

x <- as.vector(data.1[[1]])
y <- as.vector(data.1[[2]])

plot(x,y)

theta0 <- 10
theta1 <- 10
alpha <- 0.0001
initialJ <- 100000
learningIterations <- 200000
Bas
  • 1,066
  • 1
  • 10
  • 28
  • 1
    `=` assignment is not "wrong" is a style choice and may even be part of the style guide where Mohit works. – hrbrmstr Oct 18 '15 at 11:47