I'm using cloudformation template for 3 node cluster and testing DR for AZ failure in AWS.
What I did is:
- created successfully 3 node cluster (Master +2 slaves),
- terminated ASG with Master ( instances under ELB go to OutOfService status, which is fine) and EBS that Master is using
- created ASG with Master & EBS in another AZ (result: Master comes back online, ELB is back, website is back BUT... slaves never leave OutOfService status)
- I deleted ASG with slave2 & its EBS. Recreated ASG & EBS but Slave instance complaints with these messages:
Mar 3 16:31:30 ip-10-58-3-36 MarkLogic: Original host name ip-10-58-3-68.us-west-1.compute.internal does not match our new host name ip-10-58-3-36.us-west-1.compute.internal Mar 3 16:31:30 ip-10-58-3-36 MarkLogic: [/opt/MarkLogic/mlcmd/scripts/initialize-node.xsh line: 52] Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: No cluster hosts online - trying other hosts Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: [/opt/MarkLogic/mlcmd/scripts/initialize-node.xsh line: 56] Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: No known hosts online Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: [/opt/MarkLogic/mlcmd/scripts/initialize-node.xsh line: 59] Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: Failed to update hostname - waiting to retry Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: [/opt/MarkLogic/mlcmd/scripts/initialize-node.xsh line: 153] Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: Sleeping for 14 seconds Mar 3 16:31:31 ip-10-58-3-36 MarkLogic: [/opt/MarkLogic/mlcmd/scripts/initialize-node.xsh line: 154]
I scrolled through /var/log/messages and I see:
Mar 3 15:05:05 ip-10-58-3-36 MarkLogic: MARKLOGIC_INSTANCE: INSTANCE:i- MDM_NODE_NAME: MARKLOGIC_NODE_NAME: NodeB# MARKLOGIC_ZONE:us-west-1a ZONE: us-west-1b
MARKLOGIC_ZONE parameter is pointing to the wrong zone.
My question is: Why this parameter is wrong here? Is that a bug in the software?
I tried updating this parameter through user_data but still no dice.
I'm curious to see how others resolved that issue because it doesn't look like this cluster is self healing.
Another thing is if I kill Master(stop service/stop ec2) then whole cluster is unavailable. Is it possible to set, so Slaves can promote themselves to become Master?
I'm using MarkLogic 8.0-6