0

I am trying to run a multiple linear regression but my variables have slightly different lengths. My variables are SVI_1 (N=133), SVI_2 (N=131), SVI_3 (N=135), and SVI_4 (N=132). I am trying to assess the impact of these 4 variables on my dependent variable which is Anx_Diff (difference in anxiety scores).

My large data file is "Cohort" with many different columns of data. I have created the subsets below to select out the column "svi_avg" for each subgroup (the SVI 1-4 numbers correspond to quartiles):

SVI_1 = subset(Cohort, svi_avg >= 0.61175, select=c(svi_avg)) SVI_2 = subset(Cohort, svi_avg >= 0.32600 & svi_avg < 0.61175, select=c(svi_avg)) SVI_3 = subset(Cohort, svi_avg >= 0.11650 & svi_avg < 0.32600, select=c(svi_avg)) SVI_4 = subset(Cohort, svi_avg <= 0.11650, select=c(svi_avg))

This was the model I tried to use (I used the unlist() function because I realized this needs to be a vector):

SVI_model <- lm(Diff_Anx ~ unlist(SVI_1 + SVI_2 + SVI_3 + SVI_4), data = Cohort)

But I get this error message:

Error in Ops.data.frame(SVI_1, SVI_2) : ‘+’ only defined for equally-sized data frames

I don't know how to set up a multiple linear regression model for variables of differing size and I am assuming that is the problem. I am pretty new to R and I am also worried I may have subsetted my data wrong and that is the issue? Any help appreciated.

0 Answers0