Ensuring comparability of results when reproducing Stata survival analysis in R

Question

I am using a Bayesian latent variable model to develop a measure. The goal is then to use the measure to replicate some existing studies to see if the results hold. I am replicating a study that uses survival analysis. Note that the original results of this study were generated in Stata. Here is my code to reproduce these results in R. This code works fine and I get the exact same results without any problems.

# load packages
library(dplyr)
library(foreign)
library(stargazer)

# load original data 
data = read_stata("leaders, institutions, covariates, updated tvc.dta")

# The R survreg treats each observation as one independent sampling unit while Stata 
# stset time, id(leadid) failure(c_coup) specifies ID (leadid) with multiple 
# observations. I thus create a start-stop type Surv object.

# set a t0 for each row 
data = mutate(data,t0 = lag(t,default=0), .by=leadid)

# coup survival object original
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)

# coup model original
coups_original <- coxph(survobj_coup~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
      data=data, ties="breslow")

# Define a function to exponentiate coefficients
exp_coef <- function(x) { round(exp(x), 3) }

# Create the table using stargazer
stargazer(coups_original, apply.coef = exp_coef, digits = 3)

The next step is to see if these results hold when I replace one of the original independent variables, legislature, with my latent measure, dyn.estimates. To do this, I first need to merge the original dataset, data, with my latent measure dataset, latent. Here is the code to do that:

# Import latent dataset
latent = read.csv("estimates.csv")

# Merge the df, discarding all observations for which they are not exact matches
merged_data = merge(latent, data, by = c("COWcode", "year"), all = FALSE)

The next step is to re-run the original analysis, but this time substituting my latent measure, dyn.estimates for legislature.

# set a t0 for each row 
merged_data = mutate(merged_data,t0 = lag(t,default=0), .by=leadid)

# coup survival object new
survobj_coup_new = Surv(merged_data[["t0"]], merged_data[["_t"]], merged_data$c_coup)

# coup model new
coups_new <- coxph(survobj_coup_new~dyn.estimates +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
                        data=merged_data, ties="breslow")

This code runs, but here is my question. Is it still necessary to create the start-stop survival object when running the new results (i.e., the ones with my latent measure)? I just want to make sure that my new results--those produced in R--are comparable to the original ones--those produced in Stata.

For reference, here is the original dataframe (i.e., the one without my latent measure). Here is the new dataframe (i.e., the one with my latent measure.

Hard to tell without seeing the actual dataset. I'd say it wouldn't hurt, to ensure the observations in `Surv()` are lining up with your analysis dataset, which is the key bit to worry about. Note: declaring `Surv()` within the `coxph()` call makes redeclaring an easy no-brainer. It's just the `data` argument that needs to change across your `coxph()` calls: e.g., `coxph(Surv(t0, `_t`, c_coup) ~ dyn.estimates + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age, data=merged_data, ties="breslow")` — SMzgr356, Jul 07 '23 at 12:51

Ensuring comparability of results when reproducing Stata survival analysis in R

0 Answers0