0

I am trying to find the first occurrence of a FALSE in a dataframe for each row value. My rows are specific occurrences and the columns are dates. I would like to be able to find the date of first FALSE so that I can use that value to find a return date.

An example structure of my dataframe:

df <- data.frame(ID = c(1,2,3), '2001' = c(TRUE, TRUE, TRUE), 
                 '2002' = c(FALSE, TRUE, FALSE), '2003' = c(TRUE, FALSE, TRUE))

I want to end up with a second dataframe or list that contains the ID and the column name that identifies the first instance of a FALSE.

For example :

ID | Date
1  | 2002
2  | 2003
3  | 2002

I do not know the mechanism to find such a result.

The actual dataframe contains a couple thousand rows so I unfortunately can't do it by hand.

I am a new R user so please don't refrain from suggesting things you might expect a more experienced R user to have already thought about.

Thanks in advance

JMade
  • 37
  • 5

2 Answers2

1

Try this using tidyverse functions. You can reshape data to long and then filter for F values. If there are some duplicated rows the second filter can avoid them. Here the code:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% pivot_longer(-ID) %>%
  group_by(ID) %>%
  filter(value==F) %>%
  filter(!duplicated(value)) %>% select(-value) %>%
  rename(Myname=name)

Output:

# A tibble: 3 x 2
# Groups:   ID [3]
     ID Myname 
  <dbl> <chr>
1     1 2002 
2     2 2003 
3     3 2002 

Another option without duplicated values can be using the row_number() to extract the first value (row_number()==1):

library(dplyr)
library(tidyr)
#Code 2
newdf <- df %>% pivot_longer(-ID) %>%
  group_by(ID) %>%
  filter(value==F) %>%
  mutate(V=ifelse(row_number()==1,1,0)) %>%
  filter(V==1) %>%
  select(-c(value,V)) %>% rename(Myname=name)

Output:

# A tibble: 3 x 2
# Groups:   ID [3]
     ID Myname 
  <dbl> <chr>
1     1 2002 
2     2 2003 
3     3 2002 

Or using base R with apply() and a generic function:

#Code 3
out <- data.frame(df[,1,drop=F],Res=apply(df[,-1],1,function(x) names(x)[min(which(x==F))]))

Output:

  ID  Res
1  1 2002
2  2 2003
3  3 2002
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thank you so much! A true lifesaver. How would I incorporate changing the name of the resulting column in the new dataframe? It defaults to "name". (I know how to change the column name after the fact but I'm curious how you would incorporate it into the code you provided. – JMade Nov 17 '20 at 22:23
  • 1
    @JMade I have updated the solution. You have to use `rename()` and assign the desired name. Hoping that helps you! – Duck Nov 17 '20 at 22:36
  • I posted another question that is a continuation of this that you probably have insight to: [link](https://stackoverflow.com/q/64886149/14266849) @Duck – JMade Nov 18 '20 at 02:43
0

We can use max.col with ties.method = 'first' after inverting the logical values.

cbind(df[1], Date = names(df[-1])[max.col(!df[-1], ties.method = 'first')])
#  ID Date
#1  1 2002
#2  2 2003
#3  3 2002
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213