3

I have a dataset with daily observations from 1990 to 2017. The columns start and end (below) show the beginning and the end of a certain political demonstration. How can I create a dummy variable that takes the value of 1 for every day the event was ongoing as illustrated in the dummy column.

 date       cc  country start  end  dummy
 9/6/1991   20  Canada  0      0    
 9/7/1991   20  Canada  0      0    
 9/8/1991   20  Canada  0      0    
 9/9/1991   20  Canada  0      0    
 9/10/1991  20  Canada  1      0    1
 9/11/1991  20  Canada  0      0    1
 9/12/1991  20  Canada  0      0    1
 9/13/1991  20  Canada  0      0    1
 9/14/1991  20  Canada  0      0    1
 9/15/1991  20  Canada  0      0    1
 9/16/1991  20  Canada  0      0    1
 9/17/1991  20  Canada  0      1    1
 9/18/1991  20  Canada  0      0    
 9/19/1991  20  Canada  0      0    
 9/20/1991  20  Canada  0      0    
 9/21/1991  20  Canada  0      0    
 9/22/1991  20  Canada  0      0    
 9/23/1991  20  Canada  0      0    
 9/24/1991  20  Canada  0      0    
 9/25/1991  20  Canada  0      0    
 9/26/1991  20  Canada  0      0    
 9/27/1991  20  Canada  0      0    
 9/28/1991  20  Canada  1      0    1
 9/29/1991  20  Canada  0      0    1
 9/30/1991  20  Canada  0      0    1
 10/1/1991  20  Canada  0      0    1
 10/2/1991  20  Canada  0      1    1
 10/3/1991  20  Canada  0      0    
 10/4/1991  20  Canada  0      0    
 10/5/1991  20  Canada  0      0    
 10/6/1991  20  Canada  0      0    
 10/7/1991  20  Canada  0      0    

Any help is much appreciated. Thank you!

Clemens
  • 41
  • 3

1 Answers1

3

Try this (I'm assuming your data frame is called df:

df$dummy <- cumsum(df$start - df$end) + df$end

Edit: to accommodate rows where one event is starting at the same time another is ending, you can use the following, slightly harder to read version:

df$dummy <- as.numeric((cumsum(df$start - df$end) + df$end) > 0)
neerajt
  • 263
  • 2
  • 8
  • 1
    Great answer! One issue is that @Clemens wants `dummy == 1` on the end day too. You can fix that by adding `df$end` on to the end: `df$dummy <- cumsum(df$start - df$end) + df$end` – divibisan Apr 05 '18 at 15:10
  • @divibisan just wanted to add the same thank you. There is still a small problem though. In some cases the events in the data overlap so that there are 2 start dates in a row before the first one ends. Then the +df$end in the code prints a 2 instead of a 1. Can we fix that? – Clemens Apr 05 '18 at 16:05
  • 1
    `x$dummy <- as.numeric(cumsum(x$start - x$end) + x$end > 0)` This will replace the value generated by `cumsum` with 1 if >0 or 0 otherwise – divibisan Apr 05 '18 at 16:59