I have a table that looks something like this in hive. What I want to do is run a query such that every 3 hours, I look at unique workerUUIDs and do some manipulation on them. So what I want to do is between now and 3hrs before
- Capture all the unique workerUUIDs
Select * from these workerUUIDs
I am using hive to run this query and the table has a few million entries every three- six hours. What is the best way to write this query?
--------------------------------------------
| workerUUID | City | Debt | TestN| LName|
|------------------------------------------|
| 1234 | SF | 100k | 23 | Nil |
|-------------------------------------------
| 6789 | NY | 150k | 34 | Fa |
|------------------------------------------|
| 1234 | SF | 10k | 45 | Na |
--------------------------------------------
| 6789 | NY | 1k | 13 | Nil |
|-------------------------------------------
| 6789 | SF | 150k | 34 | Nil |
|------------------------------------------|
| 8999 | IN | 10k | 45 | Na |
--------------------------------------------
Basically I want to do something like
select City, Debt, TestN where workerUUID = '1234'
select City, Debt, TestN where workerUUID = '6789'
select City, Debt, TestN where workerUUID = '8999'
To clarify further, I want to generate temporary tables like
| workerUUID | City | Debt | TestN|
|------------------------------------
| 1234 | SF | 100k | 23 |
|------------------------------------
| 1234 | SF | 10k | 45 |
|-----------------------------------|
| workerUUID | City | Debt | TestN|
|------------------------------------
| 6789 | NY | 150k | 23 |
|------------------------------------
| 6789 | NY | 1k | 13 |
|------------------------------------
| 6789 | NY | 150k | 34 |
|-----------------------------------
| workerUUID | City | Debt | TestN|
|------------------------------------
| 8999 | IN | 10k | 45 |
etc
for all the unique value of workerUUIDs generated in the 3 hour gap