0

By using R, how can one develop an index score for predicting patient overall survival (OS)?

I have a shortlist of 4 candidate predictors that showed to be associate with OS. They resulted from Cox multivariate regression (run with coxph()). The predictors are protein levels, hence they are all continuous variables.

The data table looks something like this (showing only n=10 here):

          days Status    Prot1   Prot13    Prot7   Prot21
Subj_1  115.69      0 2.284498 6.319168 6.070115 8.457412
Subj_2   72.30      1 2.473034 6.066573 6.140178 8.225987
Subj_3    1.08      1 2.662481 6.212845 6.971018 8.128949
Subj_4   69.63      1 2.761391 5.902610 6.433883 7.876319
Subj_5   78.41      1 3.038122 6.355257 6.852981 7.500973
Subj_6   42.90      1 2.058549 6.020681 7.231307 8.164025
Subj_7   31.00      1 2.305096 5.415107 8.126941 8.566320
Subj_8   51.12      1 2.931978 5.574601 7.503275 7.529957
Subj_9   11.01      1 2.218814 6.270222 6.710297 8.193895
Subj_10  27.68      1 2.821947 6.132379 6.911071 8.428218

The question is: How can I create a formula which is capable to classify these patients into 2 groups: a group where the estimated survival is <60% in a 1-year period, and another which will include those with estimated survival> 60% in the same time period?

Would there be any function() in R that deals with that?

Thanks a lot in advance.

Douglas
  • 185
  • 1
  • 7

1 Answers1

0

I think you should post this question here

https://stats.stackexchange.com

since it is a matter of statistics. Anyway, you could try with a binomial regression to start, but there are many other models you could try. how many subjects do you have?

Carbo
  • 906
  • 5
  • 23
  • Thank you, @Carbo. I will post that question there. Any variation of binomial regression in particular? Also, what would be the binomial response you are thinking about? There are 200 subjects. – Douglas Dec 03 '19 at 12:47
  • Oh, I had posted a variation of that question [there](https://stats.stackexchange.com/questions/439074/defining-an-applicable-score-index-for-patient-survival-prediction), which was more focused on the formula in itself. I posted here to learn whether there is a R function to resolve this problem. – Douglas Dec 03 '19 at 12:53
  • You welcome. You could try with a Naive Bayes classifier, Binomial regression (logic and profit). Here you can find some others you can apply in R https://data-flair.training/blogs/classification-in-r/ or this https://boostedml.com/2019/06/binary-classification-in-r-logistic-regression-probit-regression-and-more.html . I would suggest you to google binary classification in R and look for some of the results. there are a lot of sources and pages online that might help you building your classifier – Carbo Dec 03 '19 at 13:04