this:
drop table if exists temp_a;
create table temp_a
as
select
case when rand(123) < 0.4 then 1
when rand(123) >= 0.4 and rand(123) < 0.8 then 2
else 3 end as label
from source_data ;
select label, count(1) as count from temp_a group by label;
but the result is :
label count(1)
1 111175
2 80509
3 87690
distribution
label count / sum
1 40%
2 28%
3 32%
it does not like 40% 40% 20%, why?
i want to know "why the distribution not like 40% 40% 20%"