I have data that I want to transpose to get visualization of the status of a single id at any point in time.
I have been trying to follow @Joe's answer from Aggregating multiple observations depending on validity ranges, but I struggle with the case of multiple modalities attributes.
This is the event-based data I have:
data have;
infile datalines delimiter="|";
input attrib :$30. multiple_attr :$1. id :$30. attrib_id :8. member_value :$100. type :$5. dt_event :datetime18.;
format dt_event datetime20.;
datalines;
TYPE|N|ABC123|111|MEDIUM|Start|01DEC2014:00:00:00
TYPE|N|ABC123|111|MEDIUM|End|18APR2021:00:00:00
TYPE|N|ABC123|111|BIG|Start|19APR2021:00:00:00
TYPE|N|ABC123|111|BIG|End|31DEC2030:00:00:00
POSITION|N|ABC123|222|TOP|Start|01DEC2014:00:00:00
POSITION|N|ABC123|222|TOP|End|31DEC2030:00:00:00
IS_ACTIVE|N|ABC123|333|YES|Start|01DEC2014:00:00:00
IS_ACTIVE|N|ABC123|333|YES|End|31DEC2030:00:00:00
LEVELS|Y|ABC123|1|ALONE|Start|01DEC2014:00:00:00
LEVELS|Y|ABC123|1|BOTH|Start|01DEC2014:00:00:00
LEVELS|Y|ABC123|1|BOTH|End|18APR2021:00:00:00
LEVELS|Y|ABC123|1|ALONE|End|31DEC2030:00:00:00
TYPE|N|DEF456|111|MEDIUM|Start|01DEC2014:00:00:00
TYPE|N|DEF456|111|MEDIUM|End|31DEC2030:00:00:00
POSITION|N|DEF456|222|MID|Start|01DEC2014:00:00:00
POSITION|N|DEF456|222|MID|End|31DEC2030:00:00:00
IS_ACTIVE|N|DEF456|333|YES|Start|01MAR2014:00:00:00
IS_ACTIVE|N|DEF456|333|YES|End|31DEC2030:00:00:00
LEVELS|Y|DEF456|1|ALONE|Start|01MAR2014:00:00:00
LEVELS|Y|DEF456|1|BOTH|Start|01MAR2014:00:00:00
LEVELS|Y|DEF456|1|BOTH|End|31MAR2018:00:00:00
LEVELS|Y|DEF456|1|BOTH|Start|20AUG2018:00:00:00
LEVELS|Y|DEF456|1|ALONE|End|31DEC2030:00:00:00
LEVELS|Y|DEF456|1|BOTH|End|31DEC2030:00:00:00
;
Using @Joe's method:
proc sort data=have;
by id attrib_id dt_event member_value;
run;
data want;
set have(rename=member_value=in_value);
by id attrib_id dt_event;
retain start_date end_date member_value orig_value;
format member_value new_value $100.;
* First row per attrib_id is easy, just start it off with a START;
if first.attrib_id then do;
start_date = dt_event;
member_value = in_value;
end;
else do; *Now is the harder part;
* For ENDs, we want to remove the current member_value from the concatenated value string, always, and then if it is the last row for that dt_event, we want to output a new record;
if type='End' then do;
*remove the current (in_)value;
if first.dt_event then orig_value = member_value;
do _i = 1 to countw(member_value,';');
if scan(orig_value,_i,';') ne in_value then do;
if orig_value > scan(orig_value,_i,';') then new_value = catx('; ',scan(orig_value,_i,';'),new_value);
else new_value = catx('; ',new_value,scan(orig_value,_i,';'));
end;
end;
orig_value = new_value;
if last.dt_event then do;
end_date = dt_event;
output;
start_date = dt_event + 86400;
member_value = new_value;
orig_value = ' ';
end;
end;
else do;
* For START, we want to be more careful about outputting, as this will output lots of unwanted rows if we do not take care;
end_date = dt_event - 86400;
if start_date < end_date and not missing(member_value) then output;
if member_value > in_value then member_value = catx('; ',in_value,member_value);
else member_value = catx('; ',member_value,in_value);
start_date = dt_event;
end_date = .;
end;
end;
format start_date end_date datetime20.;
keep id multiple_attr attrib_id member_value start_date end_date;
run;
I end up with:
+---------------+--------+-----------+--------------------+--------------------+-------------------+
| multiple_attr | id | attrib_id | start_date | end_date | member_value |
+---------------+--------+-----------+--------------------+--------------------+-------------------+
| Y | ABC123 | 1 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | ALONE; BOTH |
| Y | ABC123 | 1 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BOTH; ALONE |
| N | ABC123 | 111 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | MEDIUM |
| N | ABC123 | 111 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BIG |
| N | ABC123 | 222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | TOP |
| N | ABC123 | 333 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | YES |
| Y | DEF456 | 1 | 01MAR2014:00:00:00 | 31MAR2018:00:00:00 | ALONE; BOTH |
| Y | DEF456 | 1 | 01APR2018:00:00:00 | 19AUG2018:00:00:00 | BOTH; ALONE |
| Y | DEF456 | 1 | 20AUG2018:00:00:00 | 31DEC2030:00:00:00 | BOTH; BOTH; ALONE |
| N | DEF456 | 111 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MEDIUM |
| N | DEF456 | 222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MID |
| N | DEF456 | 333 | 01MAR2014:00:00:00 | 31DEC2030:00:00:00 | YES |
+---------------+--------+-----------+--------------------+--------------------+-------------------+
You can see that multiple modalities attributes (where multiple_attr = "Y"
) are not handled properly.
The desired output should be like this:
+---------------+--------+-----------+--------------------+--------------------+--------------+
| multiple_attr | id | attrib_id | start_date | end_date | member_value |
+---------------+--------+-----------+--------------------+--------------------+--------------+
| Y | ABC123 | 1 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | ALONE; BOTH |
| Y | ABC123 | 1 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | ALONE |
| N | ABC123 | 111 | 01DEC2014:00:00:00 | 18APR2021:00:00:00 | MEDIUM |
| N | ABC123 | 111 | 19APR2021:00:00:00 | 31DEC2030:00:00:00 | BIG |
| N | ABC123 | 222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | TOP |
| N | ABC123 | 333 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | YES |
| Y | DEF456 | 1 | 01MAR2014:00:00:00 | 31MAR2018:00:00:00 | ALONE; BOTH |
| Y | DEF456 | 1 | 01APR2018:00:00:00 | 19AUG2018:00:00:00 | ALONE |
| Y | DEF456 | 1 | 20AUG2018:00:00:00 | 31DEC2030:00:00:00 | ALONE; BOTH |
| N | DEF456 | 111 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MEDIUM |
| N | DEF456 | 222 | 01DEC2014:00:00:00 | 31DEC2030:00:00:00 | MID |
| N | DEF456 | 333 | 01MAR2014:00:00:00 | 31DEC2030:00:00:00 | YES |
+---------------+--------+-----------+--------------------+--------------------+--------------+
Is there a way to handle multiple modalities attributes? I can't find a way to delete a member value once a modality of that attribute is ending (i.e. switching from ALONE; BOTH
to ALONE
after it ended).