0

For survival analysis I want to create a variable that selects the lowest value in a row (time to first event).

stid <- 1:5
event1 <- c(26.03, 0.39, 11.26, 0.03, 8.00)
event2 <- c(13.43, 1.68, NA, 5.87, NA)
event3 <- c(17.2, NA, NA, 9.09, NA)
event4 <- c(NA, NA, NA, 1.18, NA)

df <- data.frame(stid, event1,event2,event3,event4)
df

What i tried to achieve through which.min or with dplyr::mutate = min (but failing to do so) is to create is ..

event_first <- c(13.43, 0.39, 11.26, 0.03, 8.00)
df <- data.frame(df, event_first)
df

So the 'NA' are also excluded.

It would be very helpful! I think there possibly is a tidy solution, but have not found it yet.

I hope someone can help me :)

deschen
  • 10,012
  • 3
  • 27
  • 50

3 Answers3

1

To ignore the NA's you can turn them into Inf (infinity):

df[is.na(df)] <- Inf

Run your calculations and then turn them back into NA:

event_first <- apply(df[,2:4], 1, min)
df <- data.frame(df, event_first)
df[df==Inf] <- NA
  • Works! I was wondering what the reasoning behind turning NA to Inf was? – Julius Heemelaar Nov 11 '20 at 12:40
  • Using `NA` in calculations always return `NA`. Turning them to a number guarantee that you'll be able to run code, but the number should be one that don't influence the result. As you're calculating `min`, we choose infinity (that won't ever be the minimal value). So this is a solution to this specific case, thus Karthik S answer is better :) – Ricardo Semião e Castro Nov 11 '20 at 13:58
  • I would never manually turn NAs into a number if the function specifically offers NA handling, i.e. why don't you simply add the na.rm = TRUE parameter to the apply function? – deschen Nov 12 '20 at 09:57
1

Does this work:

library(dplyr)
df %>% rowwise() %>% mutate(event_first = min(c_across(event1:event4), na.rm = T))
# A tibble: 5 x 6
# Rowwise: 
   stid event1 event2 event3 event4 event_first
  <int>  <dbl>  <dbl>  <dbl>  <dbl>       <dbl>
1     1  26.0   13.4   17.2   NA          13.4 
2     2   0.39   1.68  NA     NA           0.39
3     3  11.3   NA     NA     NA          11.3 
4     4   0.03   5.87   9.09   1.18        0.03
5     5   8     NA     NA     NA           8   
Karthik S
  • 11,348
  • 2
  • 11
  • 25
1

You can achieve this by either one of the codes. The first one is relying completely on the tidyverse approach whereas the second on is a hybrid solution leveraging apply which runs incredibly faster for large datasets compared to the rowwise.

# First solution
df %>%
  rowwise() %>%
  mutate(event_first = min(c_across(starts_with("event")), na.rm = TRUE)) %>%
  ungroup()

# Second solution
df %>%
  mutate(event_first = apply(across(starts_with("event")), 1, min, na.rm = TRUE))

The first code returns a tibble, the second one a data frame.

Here's the data frame result:

  stid event1 event2 event3 event4  event_first
1    1  26.03  13.43  17.20     NA        13.43
2    2   0.39   1.68     NA     NA         0.39
3    3  11.26     NA     NA     NA        11.26
4    4   0.03   5.87   9.09   1.18         0.03
5    5   8.00     NA     NA     NA         8.00
deschen
  • 10,012
  • 3
  • 27
  • 50
  • If you find one of the solutions helpful, please check them as the accepted answer by clicking on the little checkmark symbol. – deschen Nov 11 '20 at 12:41