4

not sure where can I get help, since this exact post was considered off-topic on StackExchange.

I want to run some regressions based on a balanced panel with electoral data from Brazil focusing on 2 time periods. I want to understand if after a change in legislation that prohibited firm donations to candidates, those individuals that depended most on these resources had a lower probability of getting elected.

I have already ran a regression like this on R:

model_continuous <- plm(percentage_of_votes ~ time + 
                        treatment + time*treatment, data = dataset, model = 'fd')

On this model I have used a continuous variable (% of votes) as my dependent variable. My treatment units or those that in time = 0 had no campaign contributions coming from corporations.

Now I want to change my dependent variable so that it is a binary variable indicating if the candidate was elected on that year. All of my units were elected on time = 0. How can I estimate a logit or probit model using fixed effects? I have tried using the pglm package in R.

model_binary <- pglm(dummy_elected ~ time + treatment + time*treatment, 
                           data = dataset, 
                           effects = 'twoways',
                           model = 'within',
                           family = 'binomial',
                           start = NULL)

However, I got this error:

Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start,  : 
  argument "start" is missing, with no default

Why is that happening? What is wrong with my model? Is it conceptually correct? I want the second regression to be as similar as possible to the first one.

I have read that clogit function from the survival package could do the job, but I dont know how to do it.

Edit:

this is what a sample dataset could look like:

dataset <- data.frame(individual = c(1,1,2,2,3,3,4,4,5,5),
                      time = c(0,1,0,1,0,1,0,1,0,1),
                      treatment = c(0,0,1,1,0,0,1,1,0,0),
                      corporate = c(0,0,0.1,0,0,0,0.5,0,0,0))
  • 1
    This link mentions crashes with fixed effect logit models with PGLM and suggests an alternative (function glmmboot from glmmML package): http://www.polsci.ucsb.edu/faculty/glasgow/ps206/ps206_panel.r – dmb Sep 16 '17 at 21:51
  • @dmb thank you, but I did not understand the syntax of the function use. How do I specificate to what variables apply individual and time fixed effects? – Arthur Carvalho Brito Sep 16 '17 at 22:51
  • 1
    In thinking about this a little more, there are various modeling options but they are contingent on your data characteristics. It sounds like time is just 0 or 1 to represent before and after election? How about treatment - it sounds like it could be 1 or 0? Or is your treatment each candidate (and if so, many candidates or not so many)? – dmb Sep 17 '17 at 21:56
  • Yes, ´time = 0´ represents the 2012 elections, before the legistlation change. time = 1 is for 2016 elections, after the change. Treatment is binary as well. Every candidate that received any kind of corporate money when it was allowed gets treatment = 1´. There is a large number of candidates, around 35k. – Arthur Carvalho Brito Sep 17 '17 at 21:59
  • Did you end up finding a best working solution for estimating fixed effects logistic models? – Jeremy K. Feb 16 '20 at 03:01

1 Answers1

1

Based on the comments, I believe the logistic regression reduces to treatment and dummy_elected. Accordingly I have fabricated the following dataset:

dataset <- data.frame("treatment" = c(rep(1,1000),rep(0,1000)),
         "dummy_elected" = c(rep(1, 700), rep(0, 300), rep(1, 500), rep(0, 500)))

I then ran the GLM model:

library(MASS)
model_binary <- glm(dummy_elected ~ treatment, family = binomial(), data = dataset)
summary(model_binary)

Note that the treatment coefficient is significant and the coefficients are given. The resulting probabilities are thus

Probability(dummy_elected) = 1 =>  1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))
Probability(dummy_elected) = 0 => 1 - 1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))

Note that these probabilities are consistent with the frequencies I generated the data.

So for each row, take the max probability across the two equations above and that's the value for dummy_elected.

dmb
  • 567
  • 5
  • 17
  • sorry, I think have not been clear about the structure of my dataset.Treatment is not always 0 in Time = 1. I follow the same individivuals across the 2 time periods. The criteria for an individual to be on treatment = 1 is such that the individual had corporate contributions in t=0. If that is the true, the same individual will have treatment = 1 on time = 1 – Arthur Carvalho Brito Sep 18 '17 at 01:51
  • So treatment is (1,1) if the individual received a contribution in time 0 and (0,0) otherwise? – dmb Sep 19 '17 at 00:16
  • Yes, if (1,1) means the treatment value for the individual in each time period – Arthur Carvalho Brito Sep 19 '17 at 00:23
  • I appreciate your answer, but what about the time and individual fixed effects? Because ´glm´ does not work with fixed effects, does it? – Arthur Carvalho Brito Sep 19 '17 at 03:00
  • 1
    For your first case with % of votes I can see that you would need time 0 and time 1. However, in the 2nd case, the time 0 elected is always 1 and treatment is constant by individual type (donation or no donation) across time. For instance, treatment =1 means donation in time 0 or 1 otherwise. Each row represents an individual candidate – dmb Sep 19 '17 at 03:27