0

I have a data set having 5 independent variables and 1 dependent variable. I want to know that can I apply polynomial Regression model to it. if yes then please guide me how to apply polynomial regression model to multiple independent variable in R when I don't have any prior information about the relationship between them.

Also please tell how to use predict function for this scenario?

assume that columns in my data are

ind1 ind2 ind3 ind4 ind5 dep

  • If you're asking whether it is possible to make polynomial regression then the answer is yes. You can use `lm` for a model with normally distributed errors. If you're asking whether it's relevant then we don't have enough information to help you – ekstroem Jul 10 '17 at 14:01
  • Thanks, I was asking about the possibilities. – Ali Zain UL Yasoob Jul 11 '17 at 04:57
  • Is there any easy way to implement polynomial regression using `lm` without adding extra columns of power 2 and power 3 etc. – Ali Zain UL Yasoob Jul 11 '17 at 05:02
  • Throw in some example data and I'll write you an example – ekstroem Jul 11 '17 at 06:42
  • In multiple linear regression we use `lm(formula = dep ~ .)` to show that dependent variable is depending on all other variables I am asking can I do it here for example my columns are: ind1 ind2 ind3 ind4 ind5 dep how can I apply polynomial regression upto degree 3 using these columns? – Ali Zain UL Yasoob Jul 11 '17 at 11:47

1 Answers1

2

Here's some examples that will generate your polynomials.

# Simulate some data
ind1 <- rnorm(100)
ind2 <- rnorm(100)
ind3 <- rnorm(100)
ind4 <- rnorm(100)
ind5 <- rnorm(100)
dep <- rnorm(100, mean=ind1)

Polynomials can be defined manually using the I function. For example a polynomial of degree 3 for ind1 will be

lm(dep ~ ind1 + I(ind1^2) + I(ind1^3))

You can also use the poly function to generate the polynomials for you, e.g.,

lm(dep ~ poly(ind1, degree=3, raw=TRUE))

The argument raw=TRUE is needed to get raw and not orthogonal polynomials. It doesn't impact the predictions or the fit but it does ensure that the parameter estimates are comparable.

Thus, you can fit your desired model with

lm(dep ~ poly(ind1, degree=3, raw=TRUE) +
         poly(ind2, degree=3, raw=TRUE) +
         poly(ind3, degree=3, raw=TRUE) +
         poly(ind4, degree=3, raw=TRUE) +
         poly(ind5, degree=3, raw=TRUE))

Note that it may be necessary to scale your predictors. If you measure something that results in large values then ind^3 may give you numerical problems.

ekstroem
  • 5,957
  • 3
  • 22
  • 48
  • @ ekstroem. thank you! it is very usefull. I wanna make sure if there are so much more ind variables (30), is there any simple way to take all variables without writing all of them? – R starter May 05 '19 at 11:17
  • If you want to include variables from a data frame they I believe you can use period `.` on the right-hand-side of a model formula. – ekstroem May 05 '19 at 20:32