0

I know it shows that this question has been already been asked/answered here: (R) [] / subset() returns an empty data frame but it didn't have the solution I was looking for. (My columns do not have padded white space)

So here is my original data

head(d)
County    ID     event1       event2         row1           row2  
Rogers    1      Hearing      Application    Plea           Trial
Rogers    2      Arrest       Hearing        Application    Plea
Rogers    3      Arrest       Hearing        Plea           Disposal

I needed the events and rows columns to all exist in one row.

events <- d %>%
  select(County, ID, contains("event"), contains("row")) %>%
  gather(m, event, contains("event")) %>%
  filter(!is.na(event)) %>%
  select(-m)

 head(events)
 County     ID     event        row1         row2
 Rogers     1      Hearing      Plea         Trial
 Rogers     1      Application  Plea         Trial
 Rogers     2      Arrest       Application  Plea
 Rogers     2      Hearing      Application  Plea

I still needed the row columns as events.

events2 <- events %>%
  select(County, ID, event, contains("row")) %>%
  gather(m, event, contains("row")) %>%
  filter(!is.na(event)) %>%
  select(-m)

I hoped it would look like this.

head(events2)
County      ID        event
Rogers      1         Hearing
Rogers      1         Application
Rogers      1         Plea
Rogers      1         Trial

But instead it returned an empty data frame with 0 observations.

events2
NULL

What am I doing wrong? Thank you!

1 Answers1

0

Similar to those mentioning in the comments, I also cannot reproduce your problem even just copy/pasting your code. That is, I get the expected output. But I do have a solution that may help.

Perhaps you can mitigate the problem by doing 1 round of piping and instead of using contains() using the regular expression version, matches() to match 'row' OR 'event', which effectively matches column heads with 'row' AND 'event'. This eliminates having to run the piping sequence twice where mistakes can be made with the copy-paste-change approach (I know I make them all the time).

events <- d %>% 
  select(County,ID,matches('event|row')) %>% 
  gather(m,event,matches('row|event[0-9]+')) %>% 
  select(-m) %>%
  filter(!is.na(event))

Briefly, the call to matches() in the gather() function says: Match 'row' or match 'event' when it is followed by at least one number between 0 and 9. See this neat graphic for more info: Regular Expressions in R.

I had to sort the data.frame after, but then I get:

>head(events)
  County ID       event
  Rogers  1     Hearing
  Rogers  1 Application
  Rogers  1        Plea
  Rogers  1       Trial
  Rogers  2      Arrest
  Rogers  2     Hearing

I am using tidyverse v1.2.1. Hope that helps!

Khlick
  • 260
  • 1
  • 9