0

I am quite new to SQL in BigQuery. I am trying to predict customer status based on their order_id and date_start.

Rules

  1. customer_status must be same for the same order_id

  2. The customer_status must be predicted based on the next Order_ID (Transaction) or previous Order_ID

this is my code

select week_start, t.account_id, t.order_id, date_start,date_ended,

from unnest(generate_date_array(date('2018-12-31'), date('2019-2-11'), interval 1 week) )  week_start cross join
     (select distinct account_id
      from t
     ) a left join
     t
     on t.account_id = a.account_id and
        t.reporting_week_start_date = week_start
order by  a.account_id,week_start

Output from my code

week_start | account_id  | order_id  | date_start   | date_ended   |
2018-12-31 |    1        |    1001   |   2019-01-01 |  2019-01-15  |
2019-01-07 |    1        |    1001   |   2019-01-01 |  2019-01-15  |
2019-01-14 |    1        |    1002   |   2019-01-15 |  2019-01-27  |
2019-01-21 |    1        |    1002   |   2019-01-15 |  2019-01-27  |
2019-01-28 |    1        |    1003   |   2019-01-29 |  2019-02-04  |

Desire Output

week_start | account_id  | order_id  | date_start   | date_ended   | Customer_Status
2018-12-31 |    1        |    1001   |   2019-01-01 |  2019-01-15  |  First Time
2019-01-07 |    1        |    1001   |   2019-01-01 |  2019-01-15  |  First Time
2019-01-14 |    1        |    1002   |   2019-01-15 |  2019-01-27  |  Recurring
2019-01-21 |    1        |    1002   |   2019-01-15 |  2019-01-27  |  Recurring
2019-01-28 |    1        |    1003   |   2019-01-29 |  2019-02-04  |  Churned

How to i do this in Big Query SQL? Thanks.

1 Answers1

0

After analyzing you case, I figured out the rules you could use to classify the customers. Since, you want to classify each customer on BigQuery, all the rules were hard coded.

You mentioned the classification should be based on the next and previous Order_id, but I also used the field account_id. For this reason, I used two methods LAG() and LEAD().

The sample code below should be incorporated to your current query, I used your current output as my source input. I followed the below algorithm:

1) Classify the First time customers. So I created the temporary table new_c;

2) Classify the Recurring customers, then the rec_c temporary table was created;

3) The Churned customers with the last temporary table churn_c;

4) Joining all the tables based on the source input (your current output), in the phase it was handled the duplicated status as when we have two 'First time' customer's status and so on.

Below it is the code, note that I used slightly different column names:

    #First selecting the new costumers and changing the namwes of some columns so we can use left join in the end
WITH
  new_c AS (
  SELECT
    _account_id__ AS acc,
    _order_id__ AS oo,
    week_start_ AS wk_nc,
    CASE
      WHEN ((LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)IS NULL) AND (LAG(_order_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)IS NULL )) THEN 'First time'
      WHEN (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) != _account_id__ ) THEN 'First time'
  END
    AS status
  FROM
    `test-proj-261014.sample.customer` ),

  #Now, we classify the customers who are Recurring
  rec_c AS (
  SELECT
    _account_id__ AS acc_rc,
    _order_id__ AS oo_rc,
    week_start_ AS wk_rc,
    CASE
      WHEN ((LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ ) AND (LAG(_order_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)!= _order_id__) AND (LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) = _account_id__ ) ) THEN 'Recurring'
  END
    AS status_rc
  FROM
    `test-proj-261014.sample.customer` ),

  #Last, the Churned customers
  churn_c AS (
  SELECT
    _account_id__ AS acc_c,
    _order_id__ AS oo_c,
    week_start_ AS wk_ch,
    CASE
      WHEN ((LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)!= _account_id__ ) AND (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ )) THEN 'Churned'
      WHEN ((LEAD(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC) IS NULL)
      AND (LAG(_account_id__) OVER (PARTITION BY _account_id__ ORDER BY _order_id__ ASC)= _account_id__ )) THEN 'Churned'
  END
    AS status_churn
  FROM
    `test-proj-261014.sample.customer` )

  #In the end, all the temporary tables are joined together and we handle same status customers.
SELECT
  week_start_,
  _account_id__,
  _order_id__,
  _date_start___,
  _date_ended___,
  CASE
    WHEN status IS NOT NULL THEN status
    WHEN (LAG(status) OVER (PARTITION BY acc ORDER BY oo ASC) ='First time'
    AND LAG(oo) OVER (PARTITION BY acc ORDER BY oo ASC)=oo)THEN 'First time'

    WHEN status_rc IS NOT NULL THEN status_rc
    WHEN (LAG(status_rc) OVER (PARTITION BY acc_rc ORDER BY oo_rc ASC) ='Recurring'
    AND LAG(oo_rc) OVER (PARTITION BY acc ORDER BY oo ASC)=oo_rc) THEN 'Recurring'

    WHEN churn_c IS NOT NULL THEN status_churn
    WHEN (LAG(status_churn) OVER (PARTITION BY acc_c ORDER BY oo_c ASC) ='Churned'
    AND LAG(oo_c) OVER (PARTITION BY acc ORDER BY oo ASC)=oo_c) THEN 'Churned'

END
  AS final_status
FROM
  `test-proj-261014.sample.customer`
INNER JOIN
  new_c
ON
  _account_id__ = ACC
  AND _order_id__ = OO
  AND week_start_ = wk_nc
INNER JOIN
  rec_c
ON
  _account_id__ = acc_rc
  AND _order_id__ = oo_rc
  AND week_start_ = wk_rc
INNER JOIN
  churn_c
ON
  _account_id__ = acc_c
  AND _order_id__ = oo_c
  AND week_start_ = wk_ch ;
halfer
  • 19,824
  • 17
  • 99
  • 186
Alexandre Moraes
  • 3,892
  • 1
  • 6
  • 13