I have 5 node Hortonworks cluster(Version - 2.4.2) in which I have installed Hawq 2.0.0.
These 5 nodes are: edge master ( Name node) node1(Data Node1) node2(Data Node2) node3(Data Node3)
I followed this link to install Hawq in HDP - http://hdb.docs.pivotal.io/hdb/install/install-ambari.html
Hawq coomponents are installed in these nodes:
Hawq master - node1 Hawq standy master - node2
Hawq segment - node1,node2,node3
At the time of installation , Hawq master, Hawq standy master , hawq segments were installed successfully but the basic Hawq tests which is run by Hawq installer in Ambari has failed:
Below in the operation performed by Installer
2016-06-30 00:24:22,513 - --- Check state of HAWQ cluster ---
2016-06-30 00:24:22,513 - Executing hawq status check...
2016-06-30 00:24:22,514 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"source /usr/local/hawq/greenplum_path.sh && hawq state -d /data/hawq/master \" "
2016-06-30 00:24:23,343 - Output of command:
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:--HAWQ instance status summary
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:------------------------------------------------------
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Master instance = Active
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Master standby = node2.localdomain
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Standby master state = Standby host passive
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segment instance count from config file = 3
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:------------------------------------------------------
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Segment Status
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:------------------------------------------------------
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segments count from catalog = 1
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segment valid (at master) = 0
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total segment failures (at master) = 3
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total number of postmaster.pid files missing = 0
20160630:00:24:23:032731 hawq_state:node1:gpadmin-[INFO]:-- Total number of postmaster.pid files found = 3
2016-06-30 00:24:23,344 - --- Check if HAWQ can write and query from a table ---
2016-06-30 00:24:23,344 - Dropping ambari_hawq_test table if exists
2016-06-30 00:24:23,344 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"export PGPORT=5432 && source /usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\"DROP TABLE IF EXISTS ambari_hawq_test;\\\" \" "
2016-06-30 00:24:23,436 - Output:
DROP TABLE
2016-06-30 00:24:23,436 - Creating table ambari_hawq_test
2016-06-30 00:24:23,436 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"export PGPORT=5432 && source /usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\"CREATE TABLE ambari_hawq_test (col1 int) DISTRIBUTED RANDOMLY;\\\" \" "
2016-06-30 00:24:23,693 - Output:
CREATE TABLE
2016-06-30 00:24:23,693 - Inserting data to table ambari_hawq_test
2016-06-30 00:24:23,693 - Command executed: su - gpadmin -c "ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null node1.localdomain \"export PGPORT=5432 && source /usr/local/hawq/greenplum_path.sh && psql -d template1 -c \\\"INSERT INTO ambari_hawq_test SELECT * FROM generate_series(1,10);\\\" \"
"
--- Above we can see that , the drop and Create table was executed but insert operation didn't succeed.
So, I executed insert command manually on Hawq master node i.e. node1
These are the steps executed manually:
[root@node1 ~]# su - gpadmin
[gpadmin@node1 ~]$ psql
psql (8.4.20, server 8.2.15)
WARNING: psql version 8.4, server version 8.2.
Some psql features might not work.
Type "help" for help.
gpadmin=#
gpadmin=# \c gpadmin
psql (8.4.20, server 8.2.15)
WARNING: psql version 8.4, server version 8.2.
Some psql features might not work.
You are now connected to database "gpadmin".
gpadmin=# create table test(name varchar);
gpadmin=# insert into test values('vikash');
-- The above insert operation thrown an error after a long time as
ERROR: failed to acquire resource from resource manager, resource request is timed out due to no available cluster (pquery.c:804)
Also, the hawq segment logs in node1 is coming as
[root@node1 ambari-agent]# tail -f /data/hawq/segment/pg_log/hawq-2016-06-30_045853.csv
2016-06-30 05:10:24.522688 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.168.122.1"
,,,,,,,0,,"network_utils.c",210,
2016-06-30 05:10:54.603726 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:10:54.603769 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 2.10.1.71",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:10:54.603778 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.168.122.1"
,,,,,,,0,,"network_utils.c",210,
2016-06-30 05:11:24.625919 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:11:24.626088 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 2.10.1.71",,,,
,,,0,,"network_utils.c",210,
2016-06-30 05:11:24.626129 EDT,,,p248618,th-1357371264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.168.122.1"
,,,,,,,0,,"network_utils.c",210,
I had also tried to check the "gp_segment_configuration"
gpadmin=# select * from gp_segment_configuration
gpadmin-# ;
registration_order | role | status | port | hostname | address | description
--------------------+------+--------+-------+-------------------+-----------+------------------------------------
-1 | s | u | 5432 | node2.localdomain | 2.10.1.72 |
0 | m | u | 5432 | node1 | node1 |
1 | p | d | 40000 | node1.localdomain | 2.10.1.71 | resource manager process was reset
(3 rows)
NOTE : In hawq-site.xml, the Resource management type is selected as "STANDALONE" instead of "YARN" from the dropdown.
Anyone have any clue, what is the issue here ??? Thanks in advance !!!