-1

I'm trying to generate estimates of the percent of Catholics within a given municipality in a country and I'm using multilevel regression and post-stratification of survey data.

The approach fits a multilevel logit and generates predicted probabilities of the dependent variable. It then weights the probabilities using poststratification of the sample to census data.

I can generate the initial estimates (which are essentially just the predicted probability of being Catholic for a given individual in the survey data.) However, when I try to take the average with the last line of code below it only returns NA's for each of the municipalities. The initial cell predictions have some missing values but nowhere near a majority.

I don't understand why I can't generate municipal weighted averages as I've followed the procedure using different data. Any help would be greatly appreciated.

rm(list=ls(all=TRUE))

library("arm")
library("foreign")

#read in megapoll and attach
ES.data <- read.dta("ES4.dta", convert.underscore = TRUE) 

#read in municipal-level dataset

munilevel <- read.dta("election.dta",convert.underscore = TRUE)
munilevel <- munilevel[order(munilevel$municode),]

#read in Census data
Census <- read.dta("poststratification4.dta",convert.underscore = TRUE)
Census <- Census[order(Census$municode),]
Census$municode <-  match(Census$municode, munilevel$municode)

#Create index variables

#At level of megapoll

ES.data$ur.female <- (ES.data$female *2) + ES.data$ur
ES.data$age.edr <- 6 * (ES.data$age -1) + ES.data$edr

#At census level (same coding as above for all variables)
Census$cur.cfemale <- (Census$cfemale *2) + Census$cur
Census$cage.cedr <- 6 * (Census$cage -1) + Census$cedr

##Municipal level variables 
Census$c.arena<-  munilevel$c.arena[Census$municode]
Census$c.fmln <- munilevel$c.fmln[Census$municode]



#run individual-level opinion model

individual.model1 <- glmer(formula = catholic ~ (1|ur.female) + (1|age) 
+ (1|edr) + (1|age.edr) + (1|municode) + p.arena +p.fmln
 ,data=ES.data, family=binomial(link="logit"))
display(individual.model1)



#examine random effects and standard errors for urban-female
ranef(individual.model1)$ur.female
se.ranef(individual.model1)$ur.female

#create vector of state ranefs and then fill in missing ones
muni.ranefs <- array(NA,c(66,1))
dimnames(muni.ranefs) <- list(c(munilevel$municode),"effect")
for(i in munilevel$municode){
muni.ranefs[i,1] <- ranef(individual.model1)$municode[i,1]
}
muni.ranefs[,1][is.na(muni.ranefs[,1])] <- 0 #set states with missing REs (b/c not in      data) to zero


#create a prediction for each cell in Census data
cellpred1 <- invlogit(fixef(individual.model1)["(Intercept)"]
    +ranef(individual.model1)$ur.female[Census$cur.cfemale,1]
    +ranef(individual.model1)$age[Census$cage,1]
    +ranef(individual.model1)$edr[Census$cedr,1]
    +ranef(individual.model1)$age.edr[Census$cage.cedr,1]
    +muni.ranefs[Census$municode,1]
    +(fixef(individual.model1)["p.fmln"] *Census$c.fmln) # municipal level 
    +(fixef(individual.model1)["p.arena"] *Census$c.arena)) # municipal level



#weights the prediction by the freq of cell                                       
cellpredweighted1 <- cellpred1 * Census$cpercent.muni

#calculates the percent within each municipality (weighted average of responses)
munipred <- 100* as.vector(tapply(cellpredweighted1, Census$municode, sum))
munipred

1 Answers1

1

The extensive amount of code is totally redundant without the data! I suppose you have NAs in the object cellpredweighted1 and by default sum() propagates NAs to the answer because if one or more elements of a vector is NA then by definition the summation of those elements is also NA.

If the above is the case here, then simply adding na.rm = TRUE to the tapply() call should solve the problem.

tapply(cellpredweighted1, Census$municode, sum, na.rm = TRUE)

You should be asking yourself why there are NAs at this stage and if these result from errors earlier on the process.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453