-1

I have a long (vertically stacked) dataset containing 10 imputations (variable "imputation" identifies imputation number). The imputation was done in SAS but I would like to calculate some c-statistics using R.

I know how to calculate c-stats using cindex function and FGR function for one imputed dataset. I am not sure how I will repeat this in the vertically stacked dataset. I tried to use "with" function but no luck.

Here are my codes:

fgr.model <- FGR(Hist(time, outcome) ~ x1 + x2 + x3, data=mydata1, cause=1)

cscore <- cindex(list(fgr.model), forumula=Hist(time, outcome)~1,
          cens.model="marginal", data=mydata1, eval.time=c(1826), cause=1)

How to calculate c-stats using cindex function and FGR function in the vertically stacked dataset?

Alex_P
  • 2,580
  • 3
  • 22
  • 37
  • Hi, it is not clear to me what the "shape" of your dataset is (also "imputation" concept can be described). Do you have 10 columns per row, each one with one imputation? But, each column represent an instance of the same variable? In case, this is quite a strange way to design a dataset, ... usually numbers that represent the same variable stay in a single column. So, perhaps, do you like to transpose each row in to a single column vector, and then stack all the obtained vetctors to build a single, one column, dataset? In this case I suggest to reformulate the question a bit. Welcome to SO. – Fabiano Tarlao Jan 24 '19 at 22:24
  • id imputation age male rural 1 1 2 1 3 1 4 1 5 1 1 2 2 2 3 2 4 2 5 2 – Tharshny Jan 25 '19 at 15:26
  • I know this is not easy in English, but you should reformulate a bit. I posed three distinct questions, I suggest you to reply in more extended way. – Fabiano Tarlao Jan 25 '19 at 17:54
  • Thank you for trying to be so helpful! I haven't had a chance to figure out how to use this site properly just yet. Let me try one more time - Say, I have the following variables: ImputationNo, ID, age and BMI. Age and BMI are being imputed. There are 100 subjects and 5 imputations. There are 4 rows (to represent each variable) and 500 rows. First 100 rows are for the 1st imputation, the next 100 rows are for the 2nd imputation and so on. I would like to calculate c-index for each imputed dataset (and later pool them using Rubin's rule - which I have to still figure out). – Tharshny Jan 28 '19 at 12:49
  • ok, perhaps I got it. You can edit your comment, you wrote 4 rows but I suppose you intended 4 columns.. am I right. I try to prepare an answer. Also I suggest to re-edit your original question in order to add these details, you should always try to create a question that is as much complete&correct as possible. In this way we left a valid resource for the others. I prepare an answer – Fabiano Tarlao Jan 28 '19 at 17:57
  • Being a newbie I remind that you can vote +1 correct/valid answers and when satisfied you can accept one answer as the right answer for your question. That is a very useful feedback. But, you have to try hard and hard in order to improve the completeness of the question. Regards – Fabiano Tarlao Jan 28 '19 at 18:22

1 Answers1

0

Based on details in the comments: I understood that you need to calculate stats on a partition of the original dataframe mydata1--i.e., you need to select only the rows that correspond to one "imputation" (this is the name you use, are you referring with a wrong word to an.. 'input session'? just curious)

First you have to create a new dataframe containing only the data for one "imputation", in the following examples we consider the operation for the imputation number 4. There are different ways to make the job done.

First way that works if columns names are right:

mydata1portion = mydata1[mydata1$ImputationNo==4,]

Second that works if position/order of columns are right:

mydata1portion = mydata1[mydata1$V1==4,]

Third way in the case the dataframe imputations/rows are ordered.

mydata1portion = mydata1[(100*(4-1)+1):(100*4),]

The first two you use the value for the column ImputationNo in order to filter the dataframe, in the last one you cut the dataframe based on the position of the rows.

Finally you can calculate the stats on the obtained mydata1portion and NOT on the full mydata1.

Fabiano Tarlao
  • 3,024
  • 33
  • 40