-1

My apologies for any errors; I only recently began learning SAS. I was given this SAS code (the code below is a reprex, not the exact code) that uses proc glm to assumedly make a random effects model. Instead of using color, the SAS code uses contrasts and idnumber to indirectly map onto color.

I would like to know how to replicate this in R. Several attempts using lme4 for random effects and contrasts using MASS::ginv were unsuccessful, so I may need to use a package I am unfamiliar with.

I would also like to know the difference between red-blue and red-blue2 and why the output is different. Thank you for your help.

data df1;
input idnumber color $ value1;
datalines;
1001 red 189
1002 red 145
1003 red 210
1004 red 194
1005 red 127
1006 red 189
1007 blue 145
1008 red 210
1009 red 194
1010 red 127.
;

proc glm data=df1;
class idnumber;
    model value1=idnumber/noint solution clparm;
    contrast 'red vs. blue' idnumber 1 1 1 1 1 1 -9 1 1 1;
    estimate 'red-blue' idnumber 1 1 1 1 1 1 -9 1 1 1/ divisor=10;
    estimate 'red-blue2' idnumber .111 .111 .111 .111 .111 .111 -.999 .111 .111 .111;
run;

Below are a few attempts at replication.

idnumber <- c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010)
color <- c('red', 'red', 'red', 'red', 'red', 'red', 'blue', 'red', 'red', 'red')
value1 <- c(189, 145, 210, 194, 127, 189, 145, 210, 194, 127)
df1 <- data.frame(idnumber, color, value1)

library(lme4)
library(MASS)
library(tidyverse)
options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))

# attempt 1
mod1 <- lme4::lmer(value1 ~ (1|idnumber), data = df1) # error
#> Error: number of levels of each grouping factor must be < number of observations (problems: idnumber)

# attempt 2
mod2 <- lme4::lmer(value1 ~ (1|color), data = df1) # singular
#> boundary (singular) fit: see help('isSingular')
summary(mod2)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: value1 ~ (1 | color)
#>    Data: df1
#> 
#> REML criterion at convergence: 90.9
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.3847 -0.8429  0.4816  0.6321  1.1138 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  color    (Intercept)    0      0.00   
#>  Residual             1104     33.22   
#> Number of obs: 10, groups:  color, 2
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   173.00      10.51   16.47
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')

# attempt 3
mat1 <- rbind(c(-0.5, 0.5))
cMat1 <- MASS::ginv(mat1)
mod3 <- lm(value1 ~ color, data = df1, contrasts = list(color = cMat1))
summary(mod3)
#> 
#> Call:
#> lm(formula = value1 ~ color, data = df1, contrasts = list(color = cMat1))
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -49.11 -23.33  12.89  17.89  33.89 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   160.56      17.74   9.052 1.78e-05 ***
#> color1         15.56      17.74   0.877    0.406    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 33.65 on 8 degrees of freedom
#> Multiple R-squared:  0.08771,    Adjusted R-squared:  -0.02633 
#> F-statistic: 0.7691 on 1 and 8 DF,  p-value: 0.4061

# attempt 4
con <- c(.1, .1, .1, .1, .1, .1, -.9, .1, .1, .1)
mod4 <- lm(value1 ~ idnumber, data = df1, contrasts = list(idnumber = con)) # error, but unsure how to fix
#> Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]): contrasts apply only to factors
Created on 2022-02-08 by the reprex package (v2.0.1)
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • 1
    Hi - unfortunately "reproduce this code in another language" is not considered on topic here. What is on topic is posting your (R) code and asking how to make it do what is needed. If you can post the best attempt you have, that will make this a more well-received question. – Joe Feb 09 '22 at 00:31
  • Thank you for your comment, I have added some of the attempts I've tried with R. – jrcalabrese Feb 09 '22 at 00:49

1 Answers1

2

Answering the part of this that is answerable: what is going on with the two different estimates.

The estimate statement includes a list of coefficients. Those are multiplied by the values, and then summed - giving the result. The reason they're different is, well, they're different... the first one is (after the division) 0.1 / -0.9 and the second is 0.111 (one ninth) / -0.999, effectively the same as the first one with a divisor of 9 instead of 10. Hence, the math is different.

I'm also not sure about your reprex, it doesn't really make any sense to use idnumber as the class variable... seems more likely you'd use color as the class variable. Is it possible this is just bad SAS code? I'm not a GLM expert, but it seems odd to me to try to use GLM with the classification variable being the ID number (assuming it's a unique ID, anyway).

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thank you for your response. I haven't spent enough time with SAS to know what is good/bad code, but I understand now how I am getting different results with two `estimate` lines now. I also do not understand why idnumber is being used and not color. – jrcalabrese Feb 09 '22 at 00:53
  • Color is not in the MODEL. IDNUMBER and COLOR are confounded. – data _null_ Feb 09 '22 at 15:24