Master postgres initdb failed while deploying HAWQ 2.0 on Hortonworks

Question

I tried to deploy HAWQ 2.0 but could not get the HAWQ Master to run. Below is the error log:

[gpadmin@hdps31hwxworker2 hawqAdminLogs]$ cat  ~/hawqAdminLogs/hawq_init_20160805.log
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Prepare to do 'hawq init'
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-You can find log in:
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20160805.log
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-GPHOME is set to:
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-/usr/local/hawq/.
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[DEBUG]:-Current user is 'gpadmin'
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[DEBUG]:-Parsing config file:
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[DEBUG]:-/usr/local/hawq/./etc/hawq-site.xml
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Init hawq with args: ['init', 'master']
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_master_address_host is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_master_address_port is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_master_directory is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_segment_directory is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_segment_address_port is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_dfs_url is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_master_temp_directory is set
20160805:23:00:10:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check: hawq_segment_temp_directory is set
20160805:23:00:11:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Check if hdfs path is available
20160805:23:00:11:050348 hawq_init:hdps31hwxworker2:gpadmin-[DEBUG]:-Check hdfs: /usr/local/hawq/./bin/gpcheckhdfs hdfs hdpsm2demo4.demo.local:8020/hawq_default off
20160805:23:00:11:050348 hawq_init:hdps31hwxworker2:gpadmin-[WARNING]:-2016-08-05 23:00:11.338621, p50546, th139769637427168, WARNING the number of nodes in pipeline is 1 [172.17.15.31(172.17.15.31)], is less than the expected number of replica 3 for block [block pool ID: isi_hdfs_pool block ID 4341187780_1000] file /hawq_default/testFile
20160805:23:00:11:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-1 segment hosts defined
20160805:23:00:11:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Set default_hash_table_bucket_number as: 6
20160805:23:00:17:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Start to init master
The files belonging to this database system will be owned by user "gpadmin".
This user must also own the server process.

The database cluster will be initialized with locale en_US.utf8.

fixing permissions on existing directory /data/hawq/master ... ok
creating subdirectories ... ok
selecting default max_connections ... 1280
selecting default shared_buffers/max_fsm_pages ... 125MB/200000
creating configuration files ... ok
creating template1 database in /data/hawq/master/base/1 ... 2016-08-05 22:00:18.554441 GMT,,,p50803,th-1212598144,,,,0,,,seg-10000,,,,,"WARNING","01000","""fsync"": can not be set by the user and will be ignored.",,,,,,,,"set_config_option","guc.c",10023,
ok
loading file-system persistent tables for template1 ...
2016-08-05 22:00:20.023594 GMT,,,p50835,th38852736,,,,0,,,seg-10000,,,,,"WARNING","01000","""fsync"": can not be set by the user and will be ignored.",,,,,,,,"set_config_option","guc.c",10023,
2016-08-05 23:00:20.126221 BST,,,p50835,th38852736,,,,0,,,seg-10000,,,,,"FATAL","XX000","could not create shared memory segment: Invalid argument (pg_shmem.c:183)","Failed system call was shmget(key=1, size=506213024, 03600).","This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMMAX.  To reduce the request size (currently 506213024 bytes), reduce PostgreSQL's shared_buffers parameter (currently 4000) and/or its max_connections parameter (currently 3000).
If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for.
The PostgreSQL documentation contains more information about shared memory configuration.",,,,,,"InternalIpcMemoryCreate","pg_shmem.c",183,1    0x87463a postgres errstart + 0x22a
2    0x74c5e6 postgres <symbol not found> + 0x74c5e6
3    0x74c7cd postgres PGSharedMemoryCreate + 0x3d
4    0x7976b6 postgres CreateSharedMemoryAndSemaphores + 0x336
5    0x880489 postgres BaseInit + 0x19
6    0x7b03bc postgres PostgresMain + 0xdbc
7    0x6c07d5 postgres main + 0x535
8    0x3c0861ed1d libc.so.6 __libc_start_main + 0xfd
9    0x4a14e9 postgres <symbol not found> + 0x4a14e9

child process exited with exit code 1
initdb: removing contents of data directory "/data/hawq/master"
Master postgres initdb failed
20160805:23:00:20:050348 hawq_init:hdps31hwxworker2:gpadmin-[INFO]:-Master postgres initdb failed
20160805:23:00:20:050348 hawq_init:hdps31hwxworker2:gpadmin-[ERROR]:-Master init failed, exit

This is in Advanced gpcheck

[global]
configfile_version = 4

[linux.mount]
mount.points = /

[linux.sysctl]
sysctl.kernel.shmmax = 500000000
sysctl.kernel.shmmni = 4096
sysctl.kernel.shmall = 400000000
sysctl.kernel.sem = 250 512000 100 2048
sysctl.kernel.sysrq = 1
sysctl.kernel.core_uses_pid = 1
sysctl.kernel.msgmnb = 65536
sysctl.kernel.msgmax = 65536
sysctl.kernel.msgmni = 2048
sysctl.net.ipv4.tcp_syncookies = 0
sysctl.net.ipv4.ip_forward = 0
sysctl.net.ipv4.conf.default.accept_source_route = 0
sysctl.net.ipv4.tcp_tw_recycle = 1
sysctl.net.ipv4.tcp_max_syn_backlog = 200000
sysctl.net.ipv4.conf.all.arp_filter = 1
sysctl.net.ipv4.ip_local_port_range = 1281 65535
sysctl.net.core.netdev_max_backlog = 200000
sysctl.vm.overcommit_memory = 2
sysctl.fs.nr_open = 2000000
sysctl.kernel.threads-max = 798720
sysctl.kernel.pid_max = 798720
# increase network
sysctl.net.core.rmem_max = 2097152
sysctl.net.core.wmem_max = 2097152

[linux.limits]
soft.nofile = 2900000
hard.nofile = 2900000
soft.nproc  = 131072
hard.nproc  = 131072

[linux.diskusage]
diskusage.monitor.mounts = /
diskusage.monitor.usagemax = 90%

[hdfs]
dfs.mem.namenode.heap = 40960
dfs.mem.datanode.heap = 6144
# in hdfs-site.xml
dfs.support.append = true
dfs.client.enable.read.from.local = true
dfs.block.local-path-access.user = gpadmin
dfs.datanode.max.transfer.threads = 40960
dfs.client.socket-timeout = 300000000
dfs.datanode.socket.write.timeout = 7200000
dfs.namenode.handler.count = 60
ipc.server.handler.queue.size = 3300
dfs.datanode.handler.count = 60
ipc.client.connection.maxidletime = 3600000
dfs.namenode.accesstime.precision = -1

Look like it is complaining about memory but I can't seem to find the parameters to change. Where is shared_buffers and max_connections?

How to fix this error in general? Thanks.

score 2 · Answer 1 · answered Aug 08 '16 at 14:25

Your memory settings are too low to initialize the database. Don't bother with shared_buffers or max_connections.

You have:

kernel.shmmax = 500000000
kernel.shmall = 400000000

and it should be:

kernel.shmmax = 1000000000
kernel.shmall = 4000000000

Reference: http://hdb.docs.pivotal.io/hdb/install/install-cli.html

I would also make sure you have enough swap configured on your nodes based on the amount of RAM you have.

Reference: http://hdb.docs.pivotal.io/20/requirements/system-requirements.html

I have vm.swappiness=0. I guess I misunderstood and think kernel.shmmax must be greater than kernel.shmall — HP., Aug 09 '16 at 07:09

score 1 · Accepted Answer · answered Aug 05 '16 at 23:34

1

Shared_buffer sets the amount of memory a HAWQ segment instance uses for shared memory buffers. This setting must be at least 128KB and at least 16KB times max_connections.

When setting shared_buffers, the values for the operating system parameters SHMMAX or SHMALL might also need to be adjusted

The value of SHMMAX must be greater than this value: shared_buffers + other_seg_shmem

You can set the parameter values using "hawq config " utility

hawq config -s shared_buffers (Will show you the value )

hawq config -c shared_buffers -v value .Please let me know how that goes !

answered Aug 05 '16 at 23:34

pratheesh_nair

121
4

Reduce the maximun connections and/or shared_buffers value will fix the issue Use hawq config -c max_connections -v value – pratheesh_nair Aug 06 '16 at 00:11
I tried to use `hawq config -s shared_buffers` and it gave me error `Failed to retrieve GUC information, the database is not accesible` – HP. Aug 06 '16 at 04:35
Since database is down ,you will not be able to change or see the parameter values .We can set these values in postgresql.conf file and can start the database . Postgresql.conf file will be in MASTER_DATA_DIRECTORY of hawq – pratheesh_nair Aug 06 '16 at 23:22
I could not find that file `find / -name "Postgresql.conf"` returned nothing and `echo $MASTER_DATA_DIRECTORY` returned nothing as well. It's not in greenplum_path.sh either. Is it in different host? – HP. Aug 07 '16 at 00:00
hawq config didnt help reset the values, when attempted......( hawq config -c shared_buffers -v 128kB GUC shared_buffers does not exist in hawq-site.xml Try to add it with value: 128kB GUC : shared_buffers Value : 128kB) it seems to set value but rechecking it shows that it did not successfully set it (hawq config -s shared_buffers GUC : shared_buffers Value : 128MB) – sandeepkunkunuru Sep 02 '16 at 01:34
hawq config however did add shared_buffers parameter to hawq_site.xml which seems to be ineffective and is also being wiped out each time hawq service is restarted – sandeepkunkunuru Sep 02 '16 at 01:38

Master postgres initdb failed while deploying HAWQ 2.0 on Hortonworks

2 Answers2