0

I am doing some data quality checks. Flag 2 needs COUNT(). How could I structure this query So that I can keep the other flags and get my agg fn flag?

WITH
  OrderTable AS(
SELECT
  order_id, product_id, country, bought_date, price
FROM clothes_items
UNION DISTINCT
SELECT
SELECT
  order_id, product_id, country, bought_date, price
FROM decor_items)
SELECT
  orderTable.order_id,
  IF(product.price < 0.000001, orderTable.product_id, NULL),
    AS PriceFlag, //Flag 1
  IF(COUNT orderTable.order_id > 1,order_id,NULL) orderDateFlag //Flag 2
FROM Ordertable
LEFT JOIN productTable
  ON orderTable.product_id = product.product_id;

will be a few agg fns and a few more non agg calcs so ideally a solution that would enable this. Thanks in advance. Even high level would be helpful, I want to write it in the most efficient way

Jade
  • 25
  • 5
  • add a group by clause – nbk Mar 06 '23 at 19:04
  • Hi @jade, which DB engine you use and what version are you using? – Junjie Mar 06 '23 at 19:13
  • Besides that, it is still a bit unclear how the output should look like? could you specify that? – Junjie Mar 06 '23 at 19:17
  • the closest is MySQL, the output should be the order ID and the quality flags so NULL if it shouldn't be flagged or the order ID if it should be flagged. For flag 2 we don't want orders that have multiple bought dates (there is multiple order X rows if there is multiple orders) – Jade Mar 06 '23 at 19:22
  • Thanks. For the output, you mean product_id or order_id here? in the provided query, it's currently [product_id,pricelag, orderdateflag]. so the expected output is [product_id, flag1, flag2, flag3,......], a.k.a. product_id, not order_id? – Junjie Mar 06 '23 at 19:28
  • I see. For "so NULL if it shouldn't be flagged or the order ID if it should be flagged", the flag output should be order_id for all flags? but based on provided query, your current output is [order_id, flag1: product_id, flag2: order_id]. Let's say order_id 1 have 3 bought dates (2023-01-01,2023-01-02,2023-01-03] and 3 products (product_id100,price1.20],(product_id101, price0.0000001),(product_id102,price2.0]. Then the output should be 3 rows [1, NULL, 1],[1, 101,1],[1, NULL,1] (current_output) or [1, NULL, 1],[1, 1,1],[1, NULL,1] (expected output)? – Junjie Mar 07 '23 at 07:35

0 Answers0