0

I have setup a cluster with disk based tables, with 2 datanodes ,one mgmt , and 2 sql nodes.

Cluster is working fine... So I just wanted to test its HA , so I manually killed one datanode with KILL command at a linux prompt and I connected to sql nodes and inserted some records into a table.

I then tried to start datanode which I killed with this command

bin/ndbd without using any --initial

When I see the logs on managmnt node it shows the below error and data node is not starting:

/Node 3: Forced node shutdown completed. Occurred during startphase 5. Caused by error 2355: 'Failure to restore schema(Resource configuration error). Permanent error, external action needed'.

But when I use --initial it starts then it means to do a clean start from start (which will be time consuming to copy all the data files what if we have 100G ?)

But I want the data node to start copying the record from the point it stopped (killed)

How do I do this?

KCD
  • 9,873
  • 5
  • 66
  • 75
sai
  • 165
  • 1
  • 10

1 Answers1

1

You have hit a unrecoverable fault (for that node, not the whole cluster) so you have no choice but to rebuild it. To avoid this scenario, safely stop the node rather than killing the process. e.g. to stop node 3 run:

ndb_mgm -e '3 stop'

However to recover you will not lose data if nodes in that node group (you only have one group) are still up to recover data from.

First start the other nodes in the node group.

/bin/ndbd

Check they are "started" (or you will lose data)

ndb_mgm -e show

On the corrupt node initialise it with the other node's data

node 3> /bin/ndbd --initial
KCD
  • 9,873
  • 5
  • 66
  • 75