1

Below is the table where I have customer_id and different phones they have.

customer_id     phone_number
101            123456789
102            234567891
103            345678912
102            456789123
101            567891234
104            678912345
105            789123456
106            891234567
106            912345678
106            456457234
101            655435664
107            453426782

Now, I want to find customer_id and distinct phone number count.
So I used this query:

select distinct customer_id ,count(distinct phone_number)
from customer_phone;

customer_id   no of phones
101            3
102            2
103            1
104            1
105            1
106            3
107            1

And, from the above table my final goal is to achieve the below output which takes the counts and puts in different buckets and then count number of consumers that fall in those buckets.

Buckets no of consumers
3         2
2         1
1         4

There are close to 200 million records. Can you please explain an efficient way to work on this?

Matheus Lacerda
  • 5,983
  • 11
  • 29
  • 45
Ishaj
  • 15
  • 3

2 Answers2

1

You can use width_bucket for that:

select bucket, count(*)
from (
  select width_bucket(count(distinct phone_number), 1, 10, 10) as bucket
  from customer_phone
  group by customer_id
) t
group by bucket;

width_bucket(..., 1, 10, 10) creates ten buckets for the values 1 through 10.

Online Example: http://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=1e6d55305570499f363837aba21bdc7e

0

Use two aggregations:

select cnt, count(*), min(customer_id), max(customer_id)
from (select customer_id, count(distinct phone_number) as cnt
      from customer_phone
      group by customer_id
     ) c
group by cnt
order by cnt;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you Gordon, can you please explain why we need to use min(customer_id) and max(customer_id)? – Ishaj Feb 22 '18 at 13:00
  • @Ishaj . . . You don't. I call this a "histogram of histograms" query. I typically include sample values, so I can investigate sample customers with any given count. – Gordon Linoff Feb 22 '18 at 13:19
  • Awesome!This helped me.Thank you! – Ishaj Feb 22 '18 at 16:11