I have a long df
that looks like this:
df <- data.frame(id=as.integer(c(123,123,123,124,124,124,125,125,126,126,126)),
date=as.Date(c("2014-03-12", "2015-05-02", "2015-09-16", "2015-10-24", "2016-12-11", "2017-10-17", "2017-08-06", "2018-01-29", "2015-09-16", "2015-11-12", "2015-12-03")),
event=as.character(c("A", "C", "E", "A", "B", "D", "A", "E", "A", "B", "E")),
order=as.integer(c(1,2,3,1,2,3,1,2,1,2,3)),
diff=as.integer(c(0,416,553,0,414,724,0,176,0,57,78)))
df
id date event order diff
1 123 2014-03-12 A 1 0
2 123 2015-05-02 C 2 416
3 123 2015-09-16 E 3 553
4 124 2015-10-24 A 1 0
5 124 2016-12-11 B 2 414
6 124 2017-10-17 D 3 724
7 125 2017-08-06 A 1 0
8 125 2018-01-29 E 2 176
9 126 2015-09-16 A 1 0
10 126 2015-11-12 B 2 57
11 126 2015-12-03 E 3 78
Each id
will always have an initial event A
and a final event, either D
or E
(mutually exclusive). Events B
and C
might or might not occur. diff
is the difference in days
between the date
of each event and the date
of the initial event A
per id
.
I want to obtain a wide df
where each event will be a column
(e.g. A_status
) where 0= absent
; 1= present
. Similarly, each correspondent diff
will be a column
(e.g. A_time
). But when event B
or C
is absent (e.g B= 0
or C= 0
), I want their time
to be filled with either D_time
or E_time
, whichever is present.
I need to create two columns
based on the values
of D
and E
:
- a
column
D.E_status
, where0=D
;1=E
, and - a
column
D.E_time
that will receive whichevertime
is recorded (ofD
orE
).
This is the desired output:
id A_status A_time B_status B_time C_status C_time D.E_status D.E_time
123 1 0 0 553 1 416 1 553
124 1 0 1 414 0 724 0 724
125 1 0 0 176 0 176 1 176
126 1 0 1 57 0 78 1 78
I really appreciate your help on this given my very basic R skills.