0

I am using hive-jdbc-0.7.1-cdh3u5.jar. I have some memory-intensive queries running on EMR which occasionally fail. When I look at the job tracker I see that the query has been killed and I see the following error:

java.io.IOException: Task process exit with nonzero status of 137

However, the Hive JDBC driver execute() call does not detect this, but instead is left hanging. No exception is caught. Any ideas? Thanks:


    ST stQuery = MY_QUERY;
    try {
        Statement stmt = conn.createStatement();
        stmt.execute(stQuery.render());   // Hangs here without knowing that the job has been killed. Exception does not get raised.
    }
    catch(SQLException sqle){
        sqle.printStackTrace();
        log.error("Failed to run query");
        return;
    }

magicalo
  • 463
  • 2
  • 5
  • 12

1 Answers1

1

It is perhaps due to the fact that the hadoop will kill the task after 10 minutes (600 sec) if it doesn't get the response and by setting the parameter mapred.task.timeout=0 we can avoid killing the tasks which are running for more than 10 min.

Also in theses cases one can write mapper/reducer in such a way as to report progress on a regular basis (more frequently than every 10 minutes). This may be achieved in a number of ways:

  • Call setStatus() on Reporter to set a human-readable description of the task’s progress
  • Call incrCounter() on Reporter to increment a user counter
  • Call progress() on Reporter to tell Hadoop that your task is still there (and making progress)
Amar
  • 11,930
  • 5
  • 50
  • 73