1

I'm trying to fit a Cox Proportional Hazard model to analyze the impact of the number of protest events on the survival rates of different political regimes in different countries.

My dataset looks similar to this:

Country    year  sdate        edate      time  evercollapsed protest GDPgrowth
Country A  2003  1996-11-24   2012-12-31 5881  0             78      14.78
Country A  2004  NA           NA         NA    0             99       8.56
Country A  2005  NA           NA         NA    0             25       3.56
Country B  2003  2000-10-26   2011-05-21 3859  1             13       2.33   
Country B  2004  NA           NA         NA    1             28       5.43
Country B  2005  NA           NA         NA    1             7        1.89  

So, basically my dataset provides yearly information on a number of variables for each year, but information about the start and end dates for the regime and the time of survival (measured in days) is only provided in the first row of each given political regime.

My data includes information for 48 different political regimes and 15 of them collapse in the time span I am looking at.

I fitted a Cox PH model with the survival package:

myCPH <- coxph(Surv(time, evercollapsed) ~ protest + GDPgrowth, data = mydata)  

This gives me the following result:

Call:
coxph(formula = Surv(time, evercollapsed) ~ protest + GDPgrowth, 
    data = mydata)

              coef exp(coef) se(coef)     z     p
protest    0.01630   1.01644  0.00722  2.26 0.024
GDPgrowth -0.03447   0.96612  0.01523 -2.26 0.024

Likelihood ratio test=9.26  on 2 df, p=0.00977
n= 48, number of events= 15 
   (556 observations deleted due to missingness)

So, these results imply that I'm losing 556 country years, because the rows in my data frame do not include the information on the survival time of the regime.

My question now is, how to include the country years into the analysis which do not provide the information on sdate, edate and time?

I assume, if I would just copy the information for each country-year, this would increase my number of regime collapses?

I assume I have to give an unique ID for every given political regime to make sure R can distinguish the different cases. Then, how do I have to fit the Cox PH model that includes the information of the differen country-years in the analysis?

Many thanks in advance!

  • In my opinion you should model your data using a Cox model with time-varying covariates (`protest` and `GDPgrowth`): https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf – Marco Sandri Jun 18 '17 at 22:48
  • Guess: You fail to sum up the times for the "missing years" and aggregate them all in one line of data. The `coxph` function does not sum up events the way that `glm` does when estimating Poisson models. If you post a larger amount of data I can show you how to do it. I suspect your current model if somewhat similar to a logistic regression model in which the only GDP information that is being considered is that of the last year. That's obviously "begging the question". You will need to define a precise hypothesis regarding the relationship pf GDP to the probability of collapse of regime. – IRTFM Jun 19 '17 at 06:52
  • Thank you @MarcoSandri I think this is a first approach towards what I'm looking for. – Jonas Stenger Jun 21 '17 at 11:00
  • @42 Thank you so much, but no, I don't want to aggreagate the missing years in one line. Actually I want to do something similar like Hollyer, Rosendorf and Vreeland (2015) in their APSR article "Transparency, Protest, and Autocratic Instability" [link](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S0003055415000428). I constructed my dataset similar to theirs, but unfortunately they did the analysis in Stata and my Stata data cleaning/prepping skills are limited. – Jonas Stenger Jun 21 '17 at 11:12
  • That's not a very helpful link: "Unfortunately you do not have access to this content, please use the Get access link below for information on how to access this content" – IRTFM Jun 21 '17 at 15:09
  • It's more helpful to search on the first author's name and you can find all the data and code at his website. He used a clustered call to the stcox function and had lagged variables as his predictors. Thereau suggests using the `coxme` package (which he also wrote and mantains) rather than using `coxph` with the `cluster` or `strata` functions. – IRTFM Jun 21 '17 at 16:06

0 Answers0