0

I am trying to get R to run the same function/code but for a dataset. I have it set up with 50 questions, yes(1)/no(0) answers and about 500 different responses for each of the 50 questions. The 500 responses are identified as male(1) or female(0). At the end of each person is their "Score", how many yes (1) answers they had. I have run a plot on R before but I want to run this plot for all 50 questions without having to change the code every time, and running the code 50 times. The code that I am using is below. dataset is the excel file that I made with gender, Q001-Q052 points, and score as columns and then 500 rows down with their responses and gender.

>LRmod01<-glm(dataset$'Q001points'~dataset$Score+dataset$Gender,data=dataset,family=binomial(link="logit")

>summary(LRmod01)

>LRodds01<-cbind("Odds-Ratio"=exp(LRmod01$coefficients),exp(confint(LRmod01)))

>View(LRodds01)

>LR.pred.probs01<-predict(LRmod01,type="response")

>View(LR.pred.probs01)

>scatter.smooth(dataset$Score,logit(LR.pred.probs01))

>scatter.smooth(dataset$Score,(LR.pred.probs01),main="Logistic Regression for Question 001", xlab="Number of Questions Yes on Exam", ylab="Log Odds for Question 001",ylim=range(0,1,na.rm=TRUE)

I want to do this coding above but for all 50 questions. Right now it only runs for Q01 and I know that it only will because of the coding "dataset$'Q001points'" part. Should I use a loop for this and if so how?

1 Answers1

0

Suppose we are using dataset mtcars:

> head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Now we want to do several linear models like

lm(mtg ~ cyl + disp, data=mtcars)

but mtg should be replaced by all other columns.

lst_model <- lapply(colnames(mtcars)[-3:-1], 
                    function(x) lm(get(x) ~ cyl + disp, data=mtcars))
lst_model <- setNames(lst_model, colnames(mtcars)[-3:-1])

gives a list of models

$hp

Call:
lm(formula = get(x) ~ cyl + disp, data = mtcars)

Coefficients:
(Intercept)          cyl         disp  
   -32.4317      24.5145       0.1189  


$drat

Call:
lm(formula = get(x) ~ cyl + disp, data = mtcars)

Coefficients:
(Intercept)          cyl         disp  
   4.607278    -0.095280    -0.001825  

[...]

Every element of the list lst_model is named after the left hand side variable, i.e. you get the model for hp ~ cyl + disp by lst_model[["hp"]].

> lst_model[["hp"]]

Call:
lm(formula = get(x) ~ cyl + disp, data = mtcars)

Coefficients:
(Intercept)          cyl         disp  
   -32.4317      24.5145       0.1189  

is the same as

> lm(hp ~ cyl + disp, data=mtcars)

Call:
lm(formula = hp ~ cyl + disp, data = mtcars)

Coefficients:
(Intercept)          cyl         disp  
   -32.4317      24.5145       0.1189  

So for example if you want to get the fitted values for a model

model <-  lm(hp ~ cyl + disp, data=mtcars)

you type model$fitted.values.

In case of lst_model you use lst_model[["hp"]][["fitted.values"]] to get same result. Since the [[ is somehow recursive, you can use lst_model[[c("hp", "fitted.values")]] which is the same.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • what does the [-3:-1] signify in " lst_model <- lapply(colnames(mtcars)[-3:-1], function(x) lm(get(x) ~ cyl + disp, data=mtcars)) " – GVSU student Jun 17 '20 at 23:49
  • That's just a fancy way to remove the first three colums. ;-) I didn't want to use `mpg`, `cyl` and `disp` as left hand side of ` ~ cyl + disp`, so I removed them using `[-3:-1]`. – Martin Gal Jun 18 '20 at 08:55
  • so if i want to use all of the columns can I just take out "[-3:-1]" ? – GVSU student Jun 18 '20 at 15:33
  • Sorry just tested it and yes, it gave me all of the columns. Can you help me understand the different data it gives me? Like what is the coefficients, residuals, effects, fitted values... so on? – GVSU student Jun 18 '20 at 15:35
  • See my new answer. – Martin Gal Jun 18 '20 at 15:50