0

I realized this question has been asked before, but having looked at all the answers, they are for the specific questions, and I could not find an answer for my unique situation.

I entered the following into R, and it worked for the first example, but not the second, and I cannot understand why.

Setting up the Data for the glm :

setwd("P:/STAT319")
ucb2<-read.table('Berkeley.PoissonTwo.txt',header=TRUE)
attach(ucb2)

ucb2 is the following :

Count   Admit Department    Gender     
313 FALSE     A     Female     
512 TRUE      A     Female     
19  FALSE     A     Male       
89  TRUE      A     Male       
207 FALSE     B     Female     
353 TRUE      B     Female     
8   FALSE     B     Male       
17  TRUE      B     Male       
205 FALSE     C     Female     
120 TRUE      C     Female     
391 FALSE     C     Male       
202 TRUE      C     Male       
279 FALSE     D     Female     
138 TRUE      D     Female     
244 FALSE     D     Male       
131 TRUE      D     Male       
138 FALSE     E     Female     
53  TRUE      E     Female     
299 FALSE     E     Male       
94  TRUE      E     Male       
351 FALSE   F       Female     
22  TRUE      F     Female     
317 FALSE     F     Male       
24  TRUE      F     Male

using a factor variable, TRUE and FALSE for Admit and NotAdmit :

Admit<-c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
fAdmit<-factor(Admit)
rAdmit<-factor(Admit,labels=c("FALSE","TRUE"))
glm2<-glm(Count~Admit+Department+Gender,family=poisson)
glm2

preparing the way for a leave one out Cross Validation

library(car)
vif(glm2)
# GVIF Df GVIF^(1/(2*Df))
# Admit         1  1               1
# Department    1  5               1
# Gender        1  1               1
step(glm2)
# Start:  AIC=2272.73
# Count ~ Admit + Department + Gender
# 
# Df Deviance    AIC
# <none>            2097.7 2272.7
# - Department  5   2257.2 2422.2
# - Gender      1   2260.6 2433.6
# - Admit       1   2327.7 2500.8
# 
# Call:  glm(formula = Count ~ Admit + Department + Gender, family = poisson)
# 
# Coefficients:
#   (Intercept)        Admit  DepartmentB  DepartmentC  
# 5.82785     -0.45674     -0.46679     -0.01621  
# DepartmentD  DepartmentE  DepartmentF   GenderMale  
# -0.16384     -0.46850     -0.26752     -0.38287  

# Degrees of Freedom: 23 Total (i.e. Null);  16 Residual
# Null Deviance:        2650 
# Residual Deviance: 2098   AIC: 2273

library(ipred)
errorest(Count~Admit+Department+Gender,data=ucb2,model=glm,est.para=control.errorest(k=24))

# Call:
#   errorest.data.frame(formula = Count ~ Admit + Department + Gender, 
#                       data = ucb2, model = glm, est.para = control.errorest(k = # 24))
# 
# 24-fold cross-validation estimator of root mean squared error
# 
# Root mean squared error:  180.5741 

so the first one worked with the Data as shown. Now to do with the same Study, we had to rearrange the Data, and perform a Logistic Regression :

ucb1<-read.table('Monday.Late.txt',header=TRUE)
attach(ucb1)
# The following object is masked _by_ .GlobalEnv:
#   
#   Admit

# The following objects are masked from ucb2:
#   
#   Admit, Department, Gender

y<-cbind(ucb1[,1],ucb1[,2])
glm1<-glm(y~Gender+Department,family=binomial)

the Data for this is as follows :

Admit   NotAdmit    Gender  Department     
512 313 female  a      
353 207 female  b      
120 205 female  c      
138 279 female  d      
53  138 female  e      
22  351 female  f      
89  19  male    a      
17  8   male    b      
202 391 male    c      
131 244 male    d      
94  299 male    e      
24  317 male    f    

Setting this new Data up for Leave One Out :

vif(glm1)
# GVIF Df GVIF^(1/(2*Df))
# Gender     1.384903  1        1.176819
# Department 1.384903  5        1.033099
step(glm1)
# Start:  AIC=103.14
# y ~ Gender + Department

# Df Deviance    AIC
# - Gender      1    21.74 102.68
# <none>             20.20 103.14
# - Department  5   783.61 856.55
# 
# Step:  AIC=102.68
# y ~ Department
# 
# Df Deviance    AIC
# <none>             21.74 102.68
# - Department  5   877.06 948.00
# 
# Call:  glm(formula = y ~ Department, family = binomial)
# 
# Coefficients:
#   (Intercept)  Departmentb  Departmentc  Departmentd  
# 0.59346     -0.05059     -1.20915     -1.25833  
# Departmente  Departmentf  
# -1.68296     -3.26911  
# 
# Degrees of Freedom: 11 Total (i.e. Null);  6 Residual
# Null Deviance:        877.1 
# Residual Deviance: 21.74  AIC: 102.7

so far, so good, but now the problem arises :

errorest(y~Gender+Department,data=ucb1,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

so why does this happen ? I tried other values for k, not sure what value k is # meant to take - I assume it was meant to be that of the number of rows

I then try the same Data, arranged a different way :

ucb1a<-read.table('Berkeley.Rearranged.txt',header=TRUE)
attach(ucb1a)
ucb1a

This is the rearrangement of the Data from before

Admitted Not_Admit Depart Genders
1       512       313      A  Female
2        89        19      A    Male
3       353       207      B  Female
4        17         8      B    Male
5       120       205      C  Female
6       202       391      C    Male
7       138       279      D  Female
8       131       244      D    Male
9        53       138      E  Female
10       94       299      E    Male
11       22       351      F  Female
12       24       317      F    Male

and then

y<-cbind(ucb1[,1],ucb1[,2])
glm1a<-glm(y~Genders+Depart,family=binomial)
vif(glm1a)
# GVIF Df GVIF^(1/(2*Df))
# Gender     1.384903  1        1.176819
# Department 1.384903  5        1.033099

step(glm1a)
# Start:  AIC=103.14
# y ~ Gender + Department
# 
# Df Deviance    AIC
# - Gender      1    21.74 102.68
# <none>             20.20 103.14
# - Department  5   783.61 856.55
# 
# Step:  AIC=102.68
# y ~ Department
# 
# Df Deviance    AIC
# <none>             21.74 102.68
# - Department  5   877.06 948.00
# 
# Call:  glm(formula = y ~ Department, family = binomial)
# 
# Coefficients:
#   (Intercept)  Departmentb  Departmentc  Departmentd  
# 0.59346     -0.05059     -1.20915     -1.25833  
# Departmente  Departmentf  
# -1.68296     -3.26911  
# 
# Degrees of Freedom: 11 Total (i.e. Null);  6 Residual
# Null Deviance:        877.1 
# Residual Deviance: 21.74  AIC: 102.7

Again, so far so good, but once more, this occurs :

errorest(y~Gender+Department,data=ucb1a,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

And believe me, I tried other numbers again for k, and I cannot understand why this one is going wrong. So if Anyone has any ideas, for this specific example of the (subscript) logical subscript being too long, please reply to this.

Machavity
  • 30,841
  • 27
  • 92
  • 100

1 Answers1

0

This issue arise when your objects are of different size. I think your problem comes from attach() but i'm not certain.. Try the code without it, or you could try with(). You should check why you have to use attach() first before using it as nicola pointed out. Also, I'm not certain what you are trying to achieve with it.

you can see in the help section of the function the following : Good practice

attach has the side effect of altering the search path and this can easily lead to the wrong object of a particular name being found. People do often forget to detach databases.

In interactive use, with is usually preferable to the use of attach/detach, unless what is a save()-produced file in which case attach() is a (safety) wrapper for load().

Pinceloup
  • 11
  • 5