1

I would like to use proc sql in sas to identify if a case or record is missing some information. I have two datasets. One is a record of an entire data collection, that shows what forms have been collected during a visit. The second is a specification of what forms should be collected during a visit. I have tried many options including data steps and sql code using not in to no avail...

Example data is below


*****  dataset crf is a listing of all forms that have been filled out at each visit ;
*****  cid is an identifier for a study center  ;
*****  pid is an identifier for a participant  ;

data crf;
  input visit cid pid form ;
cards;
1 10 101 10
1 10 101 11
1 10 101 12
1 10 102 10
1 10 102 11
2 10 101 11
2 10 101 13
2 10 102 11
2 10 102 12
2 10 102 13
;
run;


*****  dataset crfrule is a listing of all forms that should be filled out at each visit  ;
*****  so, visit 1 needs to have forms 10, 11, and 12 filled out  ;
*****  likewise, visit 2 needs to have forms 11 - 14 filled out  ;

data crfrule;
  input visit form ;
cards;
1 10
1 11
1 12
2 11
2 12
2 13
2 14
;
run;


*****  We can see from the two tables that participant 101 has a complete set of records for visit 1 ;
*****  However, participant 102 is missing form 12 for visit 1 ;
*****  For visit 2, 101 is missing forms 12 and 14, whereas 102 is missing form 14  ;


*****  I want to be able to know which forms were **NOT**  filled out by each person at each visit (i.e., which forms are missing for each visit)  ;


*****  extracting unique cases from crf ;
proc sql;
  create table visit_rec as
    select distinct cid, pid, visit
      from crf;
quit;



*****  building the list of expected forms by visit number ;
proc sql;
  create table expected as 
    select  x.*,
            y.*

    from visit_rec as x right join crfrule as y
      on x.visit = y.visit

    order by visit, cid, pid, form;
quit;


*****  so now I have a list of which forms that **SHOULD** have been filled out by each person  ;

*****  now, I just need to know if they were filled out or not...  ;

The strategy I have been trying is to merge expected back onto the crf table with some indicator of which forms are missing for each visit.

Optimally, I would like to produce a table that would have: visit, cid, pid, missing_form

Any guidance is greatly appreciated.

blue and grey
  • 393
  • 7
  • 21

3 Answers3

3

EXCEPT will do what you want. I don't necessarily know that this is the most efficient solution in general (and if you're doing this in SAS, it's almost certainly not), but given what you've done so far, this does work:

create table want as
    select cid,pid,visit,form from expected
    except select cid,pid,visit,form from crf
;

Just be careful with EXCEPT - it's very picky (note that select * doesn't work, as your tables are in different orders).

Joe
  • 62,789
  • 6
  • 49
  • 67
2

I suggest a nested query, which alternatively can be done in two steps. What about this one:

proc sql;
   create table temp as
   select distinct c.*
        , (d.visit is null and d.form is null and d.pid is null) as missing_form 
   from (
      select distinct a.pid, b.* from
      crf a, crfrule b
      ) c 
   left join crf d
   on     c.pid = d.pid 
      and c.form = d.form 
      and c.visit = d.visit
   order by c.pid, c.visit, c.form
   ;
quit;

It gives you a list with all possible (i.e. expected) combinations of pid, form, visit and a boolean indicating whether it was present or not.

BellevueBob
  • 9,498
  • 5
  • 29
  • 56
user1965813
  • 671
  • 5
  • 16
1

You could use a left join and use the where clause to filter out the records with missing records in the right table.

 select 
  e.* 
 from
  expected e left join 
  crf c on
   e.visit = c.visit and 
   e.cid = c.cid and
   e.pid = c.pid and
   e.form = c.form
 where c.visit is missing
;
Laurent de Walick
  • 2,154
  • 14
  • 11