create external table if not exists my_table
(customer_id STRING,ip_id STRING)
location 'ip_b_class';
And then:
hive> set mapred.reduce.tasks=50;
hive> select count(distinct customer_id) from my_table;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
There's 160GB in there, and with 1 reducer it takes a long time...
[ihadanny@lvshdc2en0011 ~]$ hdu
Found 8 items
162808042208 hdfs://horton/ip_b_class
...