5

I'm a SQL/SAS beginner. I wanna make an animals color report, which specifies quantity for each class black/brown, white/grey, gold/silver. I think it's not efficient to write the same code every time but i'm lost how to merge it into one:

select sum(cats), sum(dogs), sum(parrots)
from animals
where animals_color in (black, brown)
_______

select sum(cats), sum(dogs), sum(parrots)
from animals
where animals_color in (white, grey)
_______

select sum(cats), sum(dogs), sum(parrots)
from animals
where animals_color in (gold, silver)

I would like something like this but it doesn't return desired table:

select sum(cats), sum(dogs), sum(parrots)
from animals
where 
(animals_color in (black, brown))
or
(animals_color in (white, grey))
or
(animals_color in (gold, silver))

I would very appreciate any tips or help! SAS's Data Step tips or alternative solution are welcome as well.

Tom
  • 339
  • 2
  • 9

5 Answers5

5

A SAS-y way to do this would be to create a custom format for the animal color type and then use a SAS procedure to create your desired output. My example uses PROC MEANS, but others like TABULATE or REPORT could easily be used.

data have;
format animals_color $8. cats dogs parrots best.;

do i=1 to 10;
    do animals_color='black', 'brown','white', 'grey','gold','silver';
        cats = floor(ranuni(1)*10);
        dogs = floor(ranuni(1)*10);
        parrots= floor(ranuni(1)*10);
        output;
    end;
end;
drop i;
run;

proc format ;
value  $animal_ctype 'black'='black/brown'
                   'brown'='black/brown'
                   'white'='white/grey'
                   'grey'='white/grey'
                   'gold'='gold/silver'
                   'silver'='gold/silver'
                   other='unknown';
run;

proc means data=have sum;
var cats dogs parrots;
format animals_color $animal_ctype.;
class animals_color;
run;
DomPazz
  • 12,415
  • 17
  • 23
  • Thank you for the answer! It explained a lot. Do you have an idea what to do when I switch the group 'black/brown' to 'dark' and I wanna put there around 50 or 500 colors. Can I apply somehow WHERE IN, or IF THEN? To be specific I'm thinking about the proc format. Making every value individual seems quite inefficient. – Tom Apr 24 '18 at 13:29
  • @Tom yeah, you can build a dataset that defines the format. Look at the documentation for PROC FORMAT to see how to do that. Code the IF/THEN/ELSE in the Data Step to create that data set and then define the format. – DomPazz Apr 24 '18 at 19:53
  • I've just found it. Perfect, thank you. I'd like to ask one more thing, if i wanna define a group I assign values to each group, what if I wanna say NOT IN or NON EQUAL? I can't find the answer in the documentation. – Tom Apr 25 '18 at 09:27
3
SELECT
  color_group,
  SUM(cats)     AS sum_cats,
  SUM(dogs)     AS sum_dogs,
  SUM(parrots)  AS sum_parrots
FROM
(
  SELECT
    CASE WHEN animals_color IN ('black', 'brown' ) THEN 'black or brown'
         WHEN animals_color IN ('white', 'grey'  ) THEN 'white or grey'
         WHEN animals_color IN ('gold',  'silver') THEN 'gold  or silver'
    END
       AS color_group,
    animals.*
  FROM
    animals
  WHERE
    animals_color IN ('black', 'brown', 'white', 'grey', 'gold',  'silver')
)
  color_grouped
GROUP BY
  color_group

Or...

SELECT
  color_group,
  SUM(cats)     AS sum_cats,
  SUM(dogs)     AS sum_dogs,
  SUM(parrots)  AS sum_parrots
FROM
(
  SELECT
    CASE animal_color
      WHEN 'black'  THEN 'black or brown'
      WHEN 'brown'  THEN 'black or brown'
      WHEN 'white'  THEN 'white or grey'
      WHEN 'grey'   THEN 'white or grey'
      WHEN 'gold'   THEN 'gold  or silver'
      WHEN 'silver' THEN 'gold  or silver'
    END
       AS color_group,
    animals.*
  FROM
    animals
  WHERE
    animals_color IN ('black', 'brown', 'white', 'grey', 'gold',  'silver')
)
  color_grouped
GROUP BY
  color_group

Or...

SELECT
  map.color_group,
  SUM(animals.cats)     AS sum_cats,
  SUM(animals.dogs)     AS sum_dogs,
  SUM(animals.parrots)  AS sum_parrots
FROM
  animals
INNER JOIN
(
   SELECT 'silver' AS color, 'gold  or silver' AS color_group
   UNION ALL
   SELECT 'gold'   AS color, 'gold  or silver' AS color_group
   UNION ALL
   SELECT 'grey'   AS color, 'white or grey'   AS color_group
   UNION ALL
   SELECT 'white'  AS color, 'white or grey'   AS color_group
   UNION ALL
   SELECT 'brown'  AS color, 'black or brown'  AS color_group
   UNION ALL
   SELECT 'black'  AS color, 'black or brown'  AS color_group
)
  AS map
    ON map.color = animals.animals_color
MatBailie
  • 83,401
  • 18
  • 103
  • 137
3

You can use GROUP BY to provide aggregate totals for different categories:

SELECT SUM(cats), SUM(dogs), SUM(parrots)
, CASE WHEN animals_color IN ('black','brown') THEN 'Black / Brown'
WHEN animals_color IN ('white','grey') THEN 'White / Grey'
ELSE 'Gold/Silver' END AS Animal_Color_Class
FROM animals
WHERE animals_color IN ('black', 'brown', 'white','grey','gold','silver')
GROUP BY CASE WHEN animals_color IN ('black','brown') THEN 'Black / Brown' 
WHEN animals_color IN ('white','grey') THEN 'White / Grey'
ELSE 'Gold/Silver' END
Greg Viers
  • 3,473
  • 3
  • 18
  • 36
3

An option not mentioned so far, uses the CALCULATED keyword in SAS that allows you to group the variables.

This is a SQL SAS solution that may not work in other flavours of SQL.

The SAS solution via @DomPazz is probably the one I'd go with if you have further needs for this categorization or different statistics. It's faster in the long run. I've also borrowed used his sample data (HAVE) in this query.

proc sql;
create table want as
SELECT SUM(cats) as N_Cats, 
        SUM(dogs) as N_Dogs,
        SUM(parrots) as N_Parrots,
        CASE WHEN animals_color IN ('black','brown') THEN 'Black / Brown'
                WHEN animals_color IN ('white','grey') THEN 'White / Grey'
                ELSE 'Gold/Silver' 
        END AS Animal_Color_Class
FROM have
WHERE animals_color IN ('black', 'brown', 'white','grey','gold','silver')
GROUP BY calculated animal_color_class;
quit;
Reeza
  • 20,510
  • 4
  • 21
  • 38
2

You should be able to use conditional aggregation. You haven't given example data or expected output- so I'm not sure how a row identifies a cat, dog, or parrot. In this example I've used NULL:

SELECT SUM(CASE WHEN animals_color in (black, brown) and cats IS NOT NULL
                 THEN 1
           END) AS [black/brown cats]
      ,SUM(CASE WHEN animals_color in (white, grey) and cats IS NOT NULL
                 THEN 1
           END) AS [white/grey cats]
      ,SUM(CASE WHEN animals_color in (gold, silver) and cats IS NOT NULL
                 THEN 1
           END) AS [gold/silver cats]

      --and do the same for dogs and parrots

  FROM animals
Zorkolot
  • 1,899
  • 1
  • 11
  • 8