0

I have a cluster that created in the AWS and set up with one host Manually . we are trying to add multiple host in the same cluster. I choose REST Admin API Management(/admin/v1/cluster-config https://docs.marklogic.com/REST/POST/admin/v1/cluster-config) to add the host. I configured the steps accordingly and run the script with out any error(from terminal i verified).the host was added in to the cluster and when i verified the status in the admin page, it was showing as

host status  --  A detailed view of this host's status. 
This host is down. The following error occured while trying to contact 
it: 
XDMP-HOSTOFFLINE: Host is offline or not responding

Host    marklogic-node2-abcd.org
Online  Disconnected

In addition to that my node was not active and completely Disconnected(From UI we cannot be able to see default.xqy page with admin:8001 port).Hence we restarted the node and removed the Config (data volume).

After rebooting the node2, I can see the node2 in the cluster and when i try to access the node2 with host name , it is responding back with http://marklogic-node2-abcd.org:8001/initialize-admin.xqy

This server must now self-install the initial databases and 
application servers. Click OK to continue.

Couple of questions i would like to Know :

How to Debug the Script and where can i find the failure details ?

Secondly if my default database or application services were not configured do i need to Delete the host from the cluster and reconfigure ?

how can i write more logs to find out the errors and make my life easy?

Aswanikumar
  • 115
  • 1
  • 9

1 Answers1

1

this can be very tricky to debug without deep knowledge of aws, linux, networking protocols. and marklogic. i highly recommend starting over using the managed cluster feature, preferably starting with the supplied cloud formation template sample -- you should have that up in 10 minutes ... copy your data over to the new cluster and your good to go,

if you need to debug what you have, start by reading the docs on marklogic on aws/ec2 completely and augment with relevant aws docs, particularly wrt networking, routing, subnets, vpcs and dns. in the end you will most likely still need to rebuild your cluster. the docs have information on where to look for logs, what pitfalls to avoid, in particular highly recommends that it should not be attempted without serious consideration of the consequences --- the first being it's quite difficult to debug.

If you do want to continue down the 'Tripple black diamond slope' --- a starting point is verifying that dns and tcp/ip works perfectly from each node to each other node. and that the marklogic assignee hostname resolves to the same ip as the dns --- on each node --- prior to installing ml for the first time -- your example showed a custom dns -- this is unlikely the actual host name discovered by marklogic in startup ( see above docs) Read, then reread then sleep on it and read again the docs in their entirety -- then practice on safe dev machines a few dozen (or 100) times to learn the signs of a working configuration

bootstrapping a cluster join is more subtle then it may appear... and much much harder to fix if it's gone wrong --- if you want to do this yourself (as a-posed to using the managed cluster feature which does it for you ). definitely start with non-production 'blank' servers and practice/refine until it runs perfectly many many times in a row.

DALDEI
  • 3,722
  • 13
  • 9
  • I also debug , That the nodes are not communicating with each other and removing the config and keep it fresh from the scratch. After the above curl command , when i entered the ping it is not responding and the entire nodes are getting request timed out. We are using cloud formation template scripts to initiate the scripts. – Aswanikumar Oct 30 '18 at 20:08
  • Thank you for the advice . I started configuring for non production servers and currently When i run the below command , the server is not responding and it is not writing any error logs and not responding when i ping with host name. curl --anyauth --user user:passwd -X POST -d "group=Default" --data-urlencode "server-config@./joiner-config.xml" -H "Content-type: application/x-www-form-urlencoded" -o cluster-config.zip marklogic-node1-abcd.org:8001/admin/v1/cluster-config. The server stop responding , once i executed this command. – Aswanikumar Oct 30 '18 at 20:18