I am using a Bayesian latent variable model to develop a measure. The goal is then to use the measure to replicate some existing studies to see if the results hold. I am replicating a study that uses survival analysis. Note that the original results of this study were generated in Stata
. Here is my code to reproduce these results in R
. This code works fine and I get the exact same results without any problems.
# load packages
library(dplyr)
library(foreign)
library(stargazer)
# load original data
data = read_stata("leaders, institutions, covariates, updated tvc.dta")
# The R survreg treats each observation as one independent sampling unit while Stata
# stset time, id(leadid) failure(c_coup) specifies ID (leadid) with multiple
# observations. I thus create a start-stop type Surv object.
# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)
# coup survival object original
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)
# coup model original
coups_original <- coxph(survobj_coup~legislature + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=data, ties="breslow")
# Define a function to exponentiate coefficients
exp_coef <- function(x) { round(exp(x), 3) }
# Create the table using stargazer
stargazer(coups_original, apply.coef = exp_coef, digits = 3)
The next step is to see if these results hold when I replace one of the original independent variables, legislature
, with my latent measure, dyn.estimates
. To do this, I first need to merge the original dataset, data
, with my latent measure dataset, latent
. Here is the code to do that:
# Import latent dataset
latent = read.csv("estimates.csv")
# Merge the df, discarding all observations for which they are not exact matches
merged_data = merge(latent, data, by = c("COWcode", "year"), all = FALSE)
The next step is to re-run the original analysis, but this time substituting my latent measure, dyn.estimates
for legislature
.
# set a t0 for each row
merged_data = mutate(merged_data,t0 = lag(t,default=0), .by=leadid)
# coup survival object new
survobj_coup_new = Surv(merged_data[["t0"]], merged_data[["_t"]], merged_data$c_coup)
# coup model new
coups_new <- coxph(survobj_coup_new~dyn.estimates + lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age,
data=merged_data, ties="breslow")
This code runs, but here is my question. Is it still necessary to create the start-stop survival object when running the new results (i.e., the ones with my latent measure)? I just want to make sure that my new results--those produced in R
--are comparable to the original ones--those produced in Stata.
For reference, here is the original dataframe (i.e., the one without my latent measure). Here is the new dataframe (i.e., the one with my latent measure.