-1

I have a data-set which has columns as

x1 x2 x3 x4 x5 y

all of them has integer / float value and Y values ranges from 98,000 to 1,10,000

If I want to find the relationship between x1 and y , x2 and y ... x5 and y and come up with

y = A.x1+c

how should i do it?

I tried plotting graphs and also tried lm() and fit() functions in R.

fit <- lm(Y~X1+X2+X3+X4+X5,data=data)
step <- stepAIC(fit, direction="both")

Kindly help.

1 Answers1

0

I think it should use some specialsed package that find best linear/relation between y and variable xi. You can see for example leaps package.

You can also find the relation by looping over all your xi. Here one way to do it. Firest I warp you code in a function. And I use the dot formula notation.

lm_col <-
  function(var,data){
    fit <- lm(y~.,subset(data,select=c('y',var)))
    stepAIC(fit, direction="both")
  }

Then you loop over all you variables using lapply:

 lapply(paste0('x',seq(5)),lm_col,data=dat)

You can test this using this data:

dat <- as.data.frame(matrix(rnorm(6*10),ncol=6))
colnames(dat) <- c(paste0('x',seq(5)),'y')

But as I said at the beginning, I don't think that this is the best way to do what you want to do ( not very clear) statistically speaking.

agstudy
  • 119,832
  • 17
  • 199
  • 261