0

Got the following query going.

SELECT
    customer_id,
    NTILE(5) OVER (ORDER BY MAX(oms_order_date)) AS r_score
FROM 
    mdwh.us_raw.l_dmw_order_report
WHERE 
    quantity_ordered > 0
    AND customer_id IS NOT NULL
    AND customer_id != ('')
    AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
    AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
    AND (quantity_ordered * unit_price_amount) > 0
    AND extended_amount < 1000 --NO BULK ORDERS
    AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-01'
    AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
    customer_id
ORDER BY
    customer_id

All I'm doing here is, given some conditions, to give me the unique customer ID, then cluster their latest purchase date into quintiles and provide me with a score in the second column. But everytime I run the query, the r_score value keeps changing? What am I doing wrong..? Here's a snippet of what the table looks like (again, r_score value keeps changing):

enter image description here

Z41N
  • 97
  • 10

1 Answers1

1

The problem with ntile() is that it ensures that the groups are exactly the same size by putting the same value across different groups.

For this reason, I usually do the calculation manually, using rank():

ceil(rank() over (order by max(oms_order_date)) * 5.0 /
     count(*) over ()
    ) as r_score

If you use row_number(), you'll get the equivalent of ntile().

If you want to use ntile(), you can use additional order by keys so the sorting keys are unique.

===================

2/17/20 5:18PM EDIT

Here is the new code I'm using:

SELECT
    customer_id,
    CEIL(RANK() OVER (ORDER BY MAX(oms_order_date)) * 5 / COUNT(*) OVER ()) AS r_score,
    CEIL(RANK() OVER (ORDER BY COUNT(client_web_order_number)) * 5 / COUNT(*) OVER ()) AS f_score,
    CEIL(RANK() OVER (ORDER BY AVG(extended_amount)) * 5 / COUNT(*) OVER ()) AS m_score,
    (r_score || f_score || m_score) AS rfm_score
FROM 
    mdwh.us_raw.l_dmw_order_report t1
WHERE 
    quantity_ordered > 0
    AND customer_id IS NOT NULL
    AND customer_id != ('')
    AND oms_order_date IS NOT NULL
    AND UPPER(line_status) NOT IN ('','RETURN', 'CANCELLED')
    AND UPPER(item_description_1) NOT IN ('','FREIGHT', 'RETURN LABEL FEE', 'VISIBLE STITCH')
    AND (quantity_ordered * unit_price_amount) > 0
    AND extended_amount < 1000 --NO BULK ORDERS
    AND oms_order_date BETWEEN '2020-01-01' AND '2020-01-10'
    AND SUBSTRING(upc,1,6) IN (SELECT item_code FROM item_master_zs WHERE new_division BETWEEN '11' AND '39')
GROUP BY
    customer_id
ORDER BY
    customer_id

The issue now is that I'm getting some row's with a blank r_score, and the max value is 4 instead of 5..

Z41N
  • 97
  • 10
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for the quick respone, @GordonLinoff. Would you please explain a little more how the first query works (step by step). I'm still a bit of a noob with SQL, so trying to catch up. Guess I'm a bit overwhelmed because you mention 3 ways, if I'm not mistaken – Z41N Feb 17 '20 at 20:48
  • @zain . . . I have no idea what you are referring to. I think the replacement code i pretty clear. This only mentions one answer to your question, along with some additional information. – Gordon Linoff Feb 17 '20 at 22:02
  • I've updated my initial question with the rank() query you provided. Having some issues. @Gordon Linoff – Z41N Feb 17 '20 at 22:18
  • @zain . . . I don't see any updated. Your issue (as explained in the first paragraph) is that you have many rows with the same values. `ntile()` puts these in different tiles. The manual calculation ensures that all values go in the same tile, but the tiles are not the same size. – Gordon Linoff Feb 17 '20 at 22:45
  • 1
    @zain . . . My code has `5.0`. Yours has `5`. That makes a difference. – Gordon Linoff Feb 17 '20 at 23:03
  • Got it to work, thanks. Now how can I see what the actual quintile ranges are? In order to see where the cut-offs are for when clustering them from 1-5. @Gordon Linoff – Z41N Feb 19 '20 at 17:19