0

I have users in that sometimes are registred twice with the same value for pn "Phonenumber". Knowing that pn PhoneNumber is the same I know that they should be the same rid "RegisterID". Trouble is that pn is not very reliable either and sometimes shows null values. How to do this partition?

I want to do something like this

WITH a AS (
  SELECT 1 AS dt, 1 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL 
  SELECT 2 AS dt, 1 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL
  SELECT 2 AS dt, 2 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL 
  SELECT 3 AS dt, 2 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL 
  SELECT 3 AS dt, 2 AS rid, NULL AS pn, 1 AS cnt
)
SELECT dt, 
       COALESCE(MIN(rid) OVER (PARTITION BY pn), rid) AS rid, 
       pn, 
       SUM(cnt) AS cnt
FROM a 
GROUP BY dt, pn, rid

And get a result that looks like this:

dt rid pn cnt

1 1 1 1

2 1 1 2

3 1 1 2

Herman Wilén
  • 195
  • 12

1 Answers1

0

Fist find to each rig the largest pn. We ignore the connections over several phone numbers. Then we search for the smallest rid for each largest pn. If the largest pn is null then each rig stays for itself. Please see the (PARTITION BY ifnull(''||pn,''||rid)). Last we join this info the the main table.

WITH a AS (
  SELECT 1 AS dt, 1 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL 
  SELECT 2 AS dt, 1 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL
  SELECT 2 AS dt, 2 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL 
  SELECT 3 AS dt, 2 AS rid, 1 AS pn, 1 AS cnt
  UNION ALL 
  SELECT 3 AS dt, 2 AS rid, NULL AS pn, 1 AS cnt
  UNION ALL 
  SELECT 55 AS dt, 5 AS rid, NULL AS pn, 1 AS cnt
    UNION ALL 
  SELECT 66 AS dt, 6 AS rid, NULL AS pn, 1 AS cnt
),
B as (
  Select rid, max(pn) as pn 
  from A 
  group by 1
  ),
  C as (
    Select 
    pn, rid,
    min(rid) OVER (PARTITION BY ifnull(''||pn,''||rid)) as rid_main,
    count(1) as counts
    from B 
    group by 1,2
    order by 1,2
    )

  Select rid_main, A.*
  from A 
  left join C
  on A.rid=C.rid 
  #group by 1
Samuel
  • 2,923
  • 1
  • 4
  • 19