1

I have the titanic dataset in which I want to find the probability of survival based on 3 conditions. The following table gives the probabilities.

library(PASWR2)
tab = with(TITANIC3, ftable(fare = fare > 200, pclass, sex, survived)) %>% prop.table(1) %>% round(3) * 100
tab

Is there an easy way to add probabilities from tab table to TITANIC3 dataset as a new column?

Thanks!

Saurabh
  • 1,566
  • 10
  • 23

1 Answers1

1

This can be achieved by using the package data.table. The object TITANIC3 is of class data.frame. First you need to convert it to class data.table. When using data.table you can define new columns based on aggregations and a grouping clause directly in one line. Just run the code below.

The new column with the conditional probability of survival is survival_prob. I always recommend using data.table because it is the fastest way to manipulate data in R. However, if you want to proceed your analysis with a data.frame, just use the command setDF(titanic3) to convert the object back to class data.frame.

library(PASWR2)
library(magrittr)
library(data.table)

# convert dataset from data frame to data table 
titanic3 <- copy(TITANIC3)
setDT(titanic3)

# define new column survival_prob using by-option
titanic3[, survival_prob := round(100*mean(survived), 1), 
         by = .(fare > 200, pclass, sex)]
stats_guy
  • 695
  • 1
  • 9
  • 26
  • I have a follow-up question on the same, I will appreciate if you can look at this. - https://stackoverflow.com/questions/64680211/survival-probability-based-on-continuous-variable-in-r – Saurabh Nov 04 '20 at 12:37
  • I posted an answer to your question. – stats_guy Nov 05 '20 at 09:35
  • Thanks, stats_guy. I have accepted your answer. Really appreciate you taking time to answer my query. – Saurabh Nov 09 '20 at 16:33