0

I have a data set with cancer patients and different Outcomes

TypeofOutcome        DateStageIV

NA                   01.04.2014
Died from melanoma   01.06.2011
Died from melanoma   01.11.2013

I want a new column called "Outcome" with all patients still alive coded as 1 and all dead coded as 0. From a previous exercise I created a code:

mergedData$Outcome <- 1* (mergedData$TypeofOutcome = c ("Alive with stable disease", "Alive with progressive disease", "Alive with complete response"))

I already assumed that this will not work and I got the Error message:

Error in 1 * (mergedData$TypeofOutcome = c("Alive with stable disease", :
non-numeric argument to binary operator

I am sure that there is a simple solution for my problem.

989
  • 12,579
  • 5
  • 31
  • 53
Feli
  • 11
  • 1
  • 5
  • 1. I don't think your previous code does what you think it does (you need %in% here. 2. Is using regular expressions an option (search for 'died' in outcome. – Heroka Feb 12 '16 at 14:36
  • Well coding it into dead/alive would be possible, too. I am kind of looking for something like (in words): Outcome = 1 if TypeofOutcome is Alive with stable disease or Alive with progressive disease...., Outcome = 0 if TypeofOutcome is Died from Melanoma, Died from other causes.... – Feli Feb 12 '16 at 14:40
  • How do you want NA-values handled? You could do something like dat$outcome <- grepl("Died", dat$TypeOfOutcome) – Heroka Feb 12 '16 at 14:46
  • NA-values should also be coded as 0. I will try your suggestion asap :) – Feli Feb 12 '16 at 14:51

1 Answers1

0

If I understand you right, you want to create a dichotomous variable dependent on the value of a string variable, for example: if TypeOfOutcome matches any of "Alive with stable disease", "Alive with progressive disease" or "Alive with complete response", Outcome would be 1 otherwise 0. I assume your dataset looks similar to this

mergedData <- data.frame(
  TypeOfOutcome = c("Alive with stable disease", "Alive with progressive disease", "Alive with complete response", NA, "Died from melanoma"), 
  DateStageIV = sample(seq(as.Date('2011/01/01'), as.Date('2015/01/01'), by="day"), 5))


#                    TypeOfOutcome DateStageIV
# 1      Alive with stable disease  2013-05-09
# 2 Alive with progressive disease  2014-08-08
# 3   Alive with complete response  2013-02-10
# 4                           <NA>  2014-05-23
# 5             Died from melanoma  2012-08-08

The function ifelse is suitable for this from of recoding, the basic syntax is:

ifelse(test, yes, no)

If the statment in test is true return the value of yes otherwise return the value of no. In this case test is all cases where the patient is still alive, which is indicated by the string in TypeofOutcome being "Alive with stable disease", "Alive with progressive disease" or "Alive with complete response". A test for this would be:

test <- mergedData$TypeOfOutcome %in% c("Alive with stable disease", "Alive with progressive disease", "Alive with complete response")

test would be TRUE if the value in TypeOfOutcome matches any of the cases after the %in% operator. yes would then be 1 and no would be 0. To create the new variable

mergedData$Outcome <- ifelse(test, 1, 0)

mergedData

#                    TypeOfOutcome DateStageIV Outcome
# 1      Alive with stable disease  2013-05-09       1
# 2 Alive with progressive disease  2014-08-08       1
# 3   Alive with complete response  2013-02-10       1
# 4                           <NA>  2014-05-23       0
# 5             Died from melanoma  2012-08-08       0
junkka
  • 543
  • 7
  • 11