I was wondering if it was possible to flush messages from the Hive CLI to the stderr as they occurred. Currently I am trying to execute a multi-stage query (just a sample not the actual):
SELECT COUNT(*) FROM (
SELECT user from users
where datetime = 05-10-2013
UNION ALL
SELECT user from users
where datetime = 05-10-2013
) a
This will launch 3 jobs, however if job 1 fails because it is killed, I don't want to run job 2. Currently my code is like the following, however hive is not writing to the stderr until all the subqueries finish and then it returns the error.
def execute_hive_query(query):
return_code = None
cmd = ["hive", "-e", query]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while return_code is None:
out = proc.stdout.read()
error = proc.stderr.read()
handle_hive_exception(out,error)
time.sleep(10)
return_code = proc.poll()
def handle_hive_exception(stdout,stderr):
if stderr != '':
raise Exception(stderr)
Thanks!