1

Ciao, I have data on student drop-outs and I am aiming to conduct a survival analysis I believe to examine or predict the probability of drop out at a given grade. The challenge however is I want to group grades together so for example (7,8) (9,10) (11,12)

Here is my replicating example. This is the data I have now:

data <- data.frame(STUDENT=c(1,1,1,1,2,2,2,2,3,3,3,3),
                  GRADE=c(9,10,11,12,7,8,9,10,9,10,11,12),
                  DROPOUT=c(0,0,0,0,0,0,1,1,0,0,0,1))

I made the data tall so for example STUDENT=1 never dropped out and STUDENT=2 dropped out in the 9th grade and STUDENT=3 dropped out in the 12th grade.

Now here is my basic survival analytic approach

attach(data)
survivalmodel <- Surv(time=GRADE,event=DROPOUT)

Do I need time2 = ? Could you say how important it is to have this and how it is possibly measured? I am self-taught and still reading.

So my question is how do I get drop out probabilities for GRADE bands (7,8) (9,10) (11,12) so to ultimately have a probability of student drop out in GRADES 7 and 8 separate for GRADES 9 and 10 separate for GRADES 11 and 12.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
bvowe
  • 3,004
  • 3
  • 16
  • 33
  • Can you replace the values, e.g., with `2*ceiling(data$GRADE/2)`? (Realize that you don't have to store this back in `GRADE`, you can call it `GRADE2` and only use `GRADE` in reporting. – r2evans Sep 15 '18 at 23:14
  • @r2evans I am not 100 percent sure I understand what you are saying. Basically on what I read I want to estimate probabilities for DROPOUT for the grouped GRADES but this is troubling to me because for example STUDENT 1 was in grades 9 10 11 and 12. I read this is called exposure yet to calculate it is a mystery to me. – bvowe Sep 15 '18 at 23:32
  • Are you looking to find the total dropout period across the grade 7-8 period? In that case I'd think you could first remove the rows for odd-numbered years and then run the analysis. – Jon Spring Sep 15 '18 at 23:58
  • Basically the drop-out rate for the GRADE groups. – bvowe Sep 16 '18 at 00:32

1 Answers1

1

time (what you were calling time1) should be the first observed grade attended. (I'm assuming that for any given school there would be new students transferring in.) time2 should be either the grade at which a dropout occurs or 12. Event should be as you have it, except you should not have duplicates. Line 8 should be deleted. You should construct a new dataframe that has 4 columns and three rows (one for each student.)

sdat <- read.table(text="STUDENT start GRADE DROPOUT
1 9 12 0
2 7 9 1
3 9 12 1", header=TRUE)
sdat
# NEVER use attach, but especially never with survival pkg functions

coxph( Surv(time=start, time2=GRADE, event=DROPOUT)~. , data=sdat[-1])
Call:  coxph(formula = Surv(time = start, time2 = GRADE, event = DROPOUT) ~ 
    ., data = sdat[-1])

Null model
  log likelihood= -0.6931472 
  n= 3 
IRTFM
  • 258,963
  • 21
  • 364
  • 487