2

I have a query CREATE TABLE foobar AS SELECT ... that runs successfully in Hue (the returned status is Inserted 986571 row(s)) and takes a couple seconds to complete. However, in Cloudera Manager its status - after more than 10 minutes - still says Executing.

Is it a bug in Cloudera Manager or is this query actually still running?

Marek Grzenkowicz
  • 17,024
  • 9
  • 81
  • 111

1 Answers1

5

When Hue executes a query, it leaves the query open so that users can page through results at their own pace. (Of course, this behavior isn't very useful for DDL statements.) That means that Impala still considers the query to be executing, even if it is not actively using CPU cycles (keep in mind it is still holding memory!). Hue will close the query if explicitly told to, or when the page/session is closed, e.g. using the hue command:

> build/env/bin/hue close_queries --help

Note that Impala has a query option to automatically 'timeout' queries after a period of time, see query_timeout_s. Hue sets this to 10 minutes by default, but you can override it in the hue.ini settings.

One thing to note is that when queries 'time out', they are cancelled but not closed, i.e. the query will remain "in flight" with a CANCELLED status. The reason for this is so that users (or tools) can continue to observe the query metadata (e.g. query profile, status, etc.), which would not be available if the query is fully closed and thus deregistered from the impalad. Unfortunately these cancelled queries may still hold some non-negligible resources, but this will be fixed with IMPALA-1575.

More information: Hive and Impala queries life cycle

Matt
  • 4,318
  • 1
  • 27
  • 28
  • 1
    Thanks for posting the Hue link with more info. I updated this response to include more information, especially with some details not mentioned in the Hue page about what happens in Impala, e.g. that cancelled queries may still hold resources (a JIRA currently assigned to me...). – Matt Mar 26 '15 at 15:41
  • How did you set your `HUE_CONF_DIR`? Maybe it's not pointing to the correct directory? The hue docs provide the following magic incantation for setting the envvar on a system that is running hue hosted by CM: `export HUE_CONF_DIR="/var/run/cloudera-scm-agent/process/``ls -alrt /var/run/cloudera-scm-agent/process | grep HUE | tail -1 | awk '{print $9}'``"` (There are extra backticks that should be removed, I'm struggling with escaping the backticks in the markup...) Can you check to make sure that directory exists? – Matt Mar 26 '15 at 15:43
  • Thanks for getting back to me! I will be out-of-office for a few days now. Once I return, I will give it one more try; if it still fails, I will post a new questions with the exact commands and returned errors. – Marek Grzenkowicz Mar 26 '15 at 20:59