hive query case when status "case when , rand"

Asked Mar 03 '23 at 06:38

Active Mar 03 '23 at 09:29

Viewed 47 times

this:

drop table if exists temp_a;
create table temp_a
as
select 
 case when rand(123) < 0.4 then 1 
      when  rand(123) >= 0.4 and rand(123) < 0.8 then 2 
      else 3 end as label 
from source_data ;

select label, count(1) as count from temp_a group by label;

but the result is :

label   count(1)    
1       111175     
2       80509       
3       87690

distribution 
label  count / sum
1      40%
2      28%
3      32%

it does not like 40% 40% 20%, why?

i want to know "why the distribution not like 40% 40% 20%"

edited Mar 03 '23 at 09:29

asked Mar 03 '23 at 06:38

cai zoro

1

Are you trying to use output of random function and expect it to be certain? Its `random` isnt it so the distribution will be random ? – Koushik Roy Mar 03 '23 at 06:49
In hive, the value of rand function will be like -0.03, not like [0, 1] – cai zoro Mar 06 '23 at 03:37
0.03 is between 0 and 1. And rand() can be any value like 0.16972572083627802 as well https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions Still unclear on the requirement. – Koushik Roy Mar 06 '23 at 05:01
-0.03 not +0.03 – cai zoro Mar 06 '23 at 06:02
hive rand() function will always generate 0 and 1. It cant be negative. If your hive is giving you negative number, pls tell me your tool name and hive version. – Koushik Roy Mar 06 '23 at 06:39

hive query case when status "case when , rand"

0 Answers0