3

I want to create a column in SQL similar to the flag below where I can identify the top 20th percent and bottom 20th percent of sales per block group in a given time period. I already have the sales aggregated to the block group but now I'm having trouble flagging them. For example, in bg2010 1, there were 215 sales which are roughly the top 20th percent (actually 21.5 percent, but that's okay) of all sales within that time period.

I've tried percentile_cont command and doesn't seem to be what I'm looking for, but that's also probably because I don't fully understand it so any help will be appreciated!

bg2010  sales16_17   flag
   1    215          top 20th 
   2    150 
   3    115 
   4    100 
   5    95  
   6    95  
   7    85  
   8    65           bottom 20th
   9    45           bottom 20th
   10   35           bottom 20th
Shawn
  • 4,758
  • 1
  • 20
  • 29

1 Answers1

4

Use ntile() to get numbers from 1 to 5

select t.*, ntile(5) over (order by sales16_17 desc) as tile

If you want this as a flag, you could do:

select t.*,
       (case ntile(5) over (order by sales16_17 desc)
            when 1 then 'top 20%'
            when 5 then 'bottom 20%'
        end) as flag
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • `NTILE()` is pretty awesome, but it arbitrarily splits ties. http://sqlfiddle.com/#!18/1cbbb/1 If you need precision, you'll have to roll your own. – Shawn Jul 23 '18 at 16:39
  • @shawn - how are you going to split ties without a tie-breaker? If you had 5 in a tie, none of them can truly be in any particular 5-ile - surely? – Cato Jul 23 '18 at 16:46
  • 1
    @Cato That's why I said OP might need to roll their own. Especially if it deals with financial info. I've had to do this before and I wish that `NTILE()` had been a bit more flexible. And technically, if all 5 were the same number, then they'd all be in both the Top and Bottom 25%. :-) – Shawn Jul 23 '18 at 17:01
  • And that's where deeper business rules come into play. You don't want to tell Jay he's in the 2nd quartile of sales when he sold the same as Bob, who's in the 1st quartile. – Shawn Jul 23 '18 at 17:01
  • using the same data above- let's say that you only want the ntile(5) to apply to sales over 50. I've tried something like this, but it doesn't seem to work: case when [sales16_17] > 50 then ntile(5) over (order by [sales16_17] desc) else null end as test1 – Alex S. Sandoval Aug 09 '18 at 18:39