0

EDIT - FYI

This question is also answered following pages:

I think Oozie is dumb !

Original Question

I'm using AWS EMR, with

  • emr-5.4.0
  • Hive 2.1.1
  • Tez 0.8.4
  • Oozie 4.3.0

I created following HiveQL

insert.sql

DROP TABLE IF EXISTS simple;

CREATE TABLE simple (
  name STRING
);

INSERT INTO simple(
  name
)
VALUES (
  "Oozie!"
);

SELECT * FROM simple;

And, I exec following command:

From command line

$ hive -f insert.sql

Then, I received

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: false
OK
Time taken: 1.6 seconds
OK
Time taken: 0.333 seconds
Query ID = hadoop_20170404023311_0d18d091-8916-4e58-a7e5-dbc081d5f8ab
Total jobs = 1
Launching Job 1 out of 1
Waiting for Tez session and AM to be ready...


Status: Running (Executing on YARN cluster with App id application_1491267059312_0040)

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 5.89 s
----------------------------------------------------------------------------------------------
Loading data to table default.simple
OK
Time taken: 16.207 seconds
OK
Oozie!
Time taken: 0.092 seconds, Fetched: 1 row(s)

From command line, this works. However, the process remains. What is the cause ? Please give me suggestion.

From Hue with Oozie

I realized raw hive query is working (but very slow, and remaining the process). It seems queries submit by Hue+Oozie are hanging (Progress stopped 95%).

$ yarn application -list
17/04/04 03:03:54 INFO impl.TimelineClientImpl: Timeline service address: http://ip-172-38-21-67.ap-northeast-1.compute.internal:8188/ws/v1/timeline/
17/04/04 03:03:54 INFO client.RMProxy: Connecting to ResourceManager at ip-172-38-21-67.ap-northeast-1.compute.internal/172.38.21.67:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):2
                Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State             Progress                        Tracking-URL
application_1491267059312_0039  HIVE-a3ea64b2-105f-4b24-b89d-f0359eefbd3e                        TEZ           hue         default                ACCEPTED               UNDEFINED                   0%                                 N/A
application_1491267059312_0038  oozie:launcher:T=hive2:W=Create_sdata_item_master:A=hive-3a99:ID=0000016-170404005550013-oozie-oozi-W              MAPREDUCE           hue         default                 RUNNING                 UNDEFINED                  95% http://ip-172-38-21-43.ap-northeast-1.compute.internal:33037

I also tried yarn logs -applicationId <id>, but there is no directory for yarn logs.

$ yarn logs -applicationId application_1491267059312_0038

$ sudo ls /var/log/hadoop-yarn/apps/hadoop/
ls: cannot access /var/log/hadoop-yarn/apps/hadoop/: No such file or directory
Community
  • 1
  • 1
hiropon
  • 1,675
  • 2
  • 18
  • 41
  • Tez is a bit flakey and takes some time to start up. Are you able to get into the EMR instance to check the logs? – Jason K. Apr 04 '17 at 02:56
  • @Jason I updated about logs. I think it's caused by multiple components, tez or oozie. – hiropon Apr 04 '17 at 03:17
  • I would suggest spinning up an EMR cluster with larger nodes for a short time to see if that has any effect. For numerous situations, I have found that an undersized EMR cluster or undersized nodes in the cluster lead to failures. By running from the command line, you are reducing the resources in use on the box so that could be a likely cause. – Jason K. Apr 04 '17 at 04:55

0 Answers0