how to do gene mutation multivariate survival analysis in R?

Question

I've been working on gene mutation survival analysis, the data downloaded&merged from TCGA somatic mutation file (MAF) is:

         barcode stage_group gender fustat futime  SRCAP  ZFHX4  AMER1 PCDHB8 AHNAK2
1   TCGA-CA-6719   StageI-II   MALE      0     41     WT     WT     WT     WT     WT
2   TCGA-A6-2685   StageI-II FEMALE      0    464     WT     WT     WT     WT     WT
3   TCGA-CK-6751   StageI-II FEMALE      0    518     WT     WT     WT Mutate     WT
4   TCGA-DY-A1H8 StageIII-IV FEMALE      1    992     WT     WT     WT     WT     WT
5   TCGA-AG-3887   StageI-II   MALE      0     28     WT     WT     WT     WT     WT
6   TCGA-DM-A28M   StageI-II   MALE      0   2775     WT     WT     WT     WT Mutate
7   TCGA-CM-6675 StageIII-IV   MALE      0    153     WT     WT     WT     WT     WT
8   TCGA-D5-6533     Missing FEMALE      0     40     WT     WT     WT Mutate     WT
9   TCGA-SS-A7HO   StageI-II FEMALE      0   1829     WT     WT     WT     WT     WT
10  TCGA-AY-A8YK StageIII-IV   MALE      0    209     WT     WT     WT     WT     WT
11  TCGA-AA-A02Y   StageI-II   MALE      0     31     WT     WT     WT     WT     WT
12  TCGA-AD-5900   StageI-II   MALE      0      2     WT     WT     WT     WT Mutate

SRCAP ZFHX4 AMER1 PCDHB8 AHNAK2 ... are genes selected by the univariate KM survival& log-rank test, by dividing patient to Wt and Mutate group based on gene mutate status and then order the p-values, choose p=0.05 as the threshold. Now I need to take account of all clinical features into the analysis along with these genes:

Surv(futime, fustat)~ gender+age+project+subtype+race_group+stage_group+SRCAP+ZFHX4+AMER1+PCDHB8+AHNAK2+DNAH5+NALCN+PAPPA+PCDH17+RELN+UGGT2+HYDIN

and the result:

                      coef  exp(coef)   se(coef)  robust se      z Pr(>|z|)    
genderMALE       9.020e-01  2.465e+00  3.819e-01  3.696e-01  2.441 0.014659 *  
subtypeMissing   4.793e-01  1.615e+00  8.825e-01  1.045e+00  0.459 0.646364    
subtypeMucinous  1.354e+00  3.874e+00  5.972e-01  6.053e-01  2.238 0.025250 *  
race_groupWhite -6.223e-01  5.367e-01  3.921e-01  3.903e-01 -1.594 0.110878    
SRCAPWT         -1.233e+00  2.914e-01  5.177e-01  6.516e-01 -1.892 0.058474 .  
ZFHX4WT         -1.577e+00  2.065e-01  4.996e-01  5.621e-01 -2.806 0.005014 ** 
AMER1WT         -2.932e+00  5.332e-02  6.121e-01  5.547e-01 -5.285 1.26e-07 ***
AHNAK2WT         2.190e+00  8.932e+00  1.063e+00  9.183e-01  2.385 0.017097 *  
DNAH5WT          2.011e+00  7.474e+00  7.732e-01  6.077e-01  3.310 0.000932 ***
NALCNWT         -8.528e-01  4.262e-01  4.790e-01  4.151e-01 -2.055 0.039905 *  
RELNWT           2.063e+01  9.155e+08  5.425e+03  1.659e+00 12.435  < 2e-16 ***
UGGT2WT         -2.783e+00  6.185e-02  7.052e-01  5.688e-01 -4.893 9.95e-07 ***
HYDINWT          1.864e+00  6.450e+00  7.435e-01  7.284e-01  2.559 0.010499 *

I'm not convinced about the whole procedure and the result, how the "Stage" factor is not important to survival chance? besides, some gene's hazard ratio is incredible high(RELNWT :9.155e+08 ) . not sure if the reason is the sparse & binary feature of mutation data.

what's is the proper way to preform survival analysis based on mutation data? really need an explanation....thanks.

This seems like more of an analysis or data interpretation question than a coding question. You may have better luck getting an answer on a different stack exchange site (biology maybe?) — Jan Boyer, Dec 12 '19 at 15:36
thanks, I'll check the exchange sites. actually I've asked this question on biostars, but no one replies. so I post it here — Roy, Dec 13 '19 at 03:59

how to do gene mutation multivariate survival analysis in R?

0 Answers0