1

I have a database of people with information that in which households they are included. How can I group in one statement households in which lives only woman or only a specific group of people?

Example:

enter image description here

So from this example I would like to get only one row (B) and the count that there lives two people.

To better understand I can do it when I want households where lives only people under 20 years old. The following code works:

proc sql;
    create table new_tab as
    select household, count(*) as freq
    from table
    group by household
    having max(gender) < 21;
quit;

But I couldnt find a similar code when it is not a numeric value. Can someone please give an advise how could I group only households in which lives only woman?

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Why complicate the question with age in example and no age in the data? Does gender hold the age value? For specific group of people, in your data are there any households with more than two people? – Richard Mar 15 '23 at 13:54

3 Answers3

0

If you want households with women only:

select household, count(*) as freq
from mytable
group by household
having min(gender) = 'woman' and max(gender) = min(gender)

The having clause ensures that the household contains women and nothing else.

We could also also use count and distinct:

select household, count(*) as freq
from mytable
group by household
having min(gender) = 'woman' and count(distinct gender) = 1

Or we could use a case expression within an aggregate function:

having min(case when gender = 'woman' then 1 else 0 end) = 1
GMB
  • 216,147
  • 25
  • 84
  • 135
0

You were almost there.

   data have;
        input household $ gender $;
        datalines;
    A man
    A woman
    B woman
    B woman
    C man
    C man
    C woman
    ;
    run;

If you want to have only one value of a variable in a group you use min function of boolean expression. If true the expression's value is 1, and if they are all true then minimum value is 1.

If you wanted any group that contained women you would use max function of the same boolean expression..

proc sql;
    create table want as
        select household, count(*) as freq
            from have
            group by household
            having min(gender='woman') = 1
    ;
quit;
Negdo
  • 507
  • 2
  • 8
0

Consider computing an aggregate categorical value and counts that can be used in later analysis.

Example:

select household, 
  case 
    when min(gender) ne max(gender) then 'mixed'
    else min(gender)
  end as gender
, sum (gender eq 'woman') as woman_count
, sum (gender eq 'man') as man_count
, sum (missing(gender)) as missingG_count
, sum (not missing(gender) and gender not in ('woman','man')) as otherG_count
from have
group by household
Richard
  • 25,390
  • 3
  • 25
  • 38