0

Trying to upgrade from Wheezy to Jessie (I know, late).

I already found out that despite heartbeat,pacemaker and corosync version numbers barely changing, there is big change in how it's supposed to work. I'm using this article https://wiki.debian.org/Debian-HA/ClustersFromScratch to install it.

However, I'm unable to start the cluster with original configuration. It reports ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected

I noticed in log

May 25 01:07:59 [4989] domainname.com        cib:   notice: main:      Using legacy config location: /var/lib/heartbeat/crm
May 25 01:07:59 [4989] domainname.com        cib:     info: get_cluster_type:  Verifying cluster type: 'corosync'
May 25 01:07:59 [4989] domainname.com        cib:     info: get_cluster_type:  Assuming an active 'corosync' cluster
May 25 01:07:59 [4989] domainname.com        cib:     info: retrieveCib:       Reading cluster configuration file /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm
May 25 01:07:59 [4992] domainname.com      attrd:     info: corosync_node_name:        Unable to get node name for nodeid 2130...
May 25 01:07:59 [4992] domainname.com      attrd:   notice: get_node_name:     Defaulting to uname -n for the local corosync node name
May 25 01:07:59 [4992] domainname.com      attrd:     info: crm_get_peer:      Node 2130... is now known as domainname.com
May 25 01:07:59 [4990] domainname.com stonith-ng:     info: corosync_node_name:        Unable to get node name for nodeid 2130...
May 25 01:07:59 [4990] domainname.com stonith-ng:   notice: get_node_name:     Defaulting to uname -n for the local corosync node name
May 25 01:07:59 [4990] domainname.com stonith-ng:     info: crm_get_peer:      Node 2130... is now known as domainname.com
May 25 01:07:59 [4992] domainname.com      attrd:     info: main:      Cluster connection active
May 25 01:07:59 [4992] domainname.com      attrd:     info: qb_ipcs_us_publish:        server name: attrd
May 25 01:07:59 [4992] domainname.com      attrd:     info: main:      Accepting attribute updates
May 25 01:07:59 [4989] domainname.com        cib:     info: validate_with_relaxng:     Creating RNG parser context
May 25 01:07:59 [4987] domainname.com pacemakerd:    error: pcmk_child_exit:   The cib process (4989) exited: Key has expired (127)

and when I remove /var/lib/heartbeat/crm it at least starts so I can do crm status.

Now: the question: is the old configuration supposed to work and I should search elsewhere (the log is HUGE), or will I have simpler to remove the directory and just define the four resources I have again?

For reference, version numbers: Wheezy:

pacemaker/wheezy uptodate 1.1.7-1
libcorosync4/wheezy uptodate 1.4.2-3
heartbeat/wheezy uptodate 1:3.0.5-3
libheartbeat2/wheezy uptodate 1:3.0.5-3

Jessie:

pacemaker:amd64/jessie-backports 1.1.16-1~bpo8+1 uptodate
corosync:amd64/jessie-backports 2.4.2-3+deb9u1~bpo8+1 uptodate
libcorosync-common4:amd64/jessie-backports 2.4.2-3+deb9u1~bpo8+1 uptodate
libcorosync4:all/jessie 1.4.6-1.1 uptodate
heartbeat:amd64/jessie 1:3.0.5+hg12629-1.2 uptodate
libheartbeat2:amd64/jessie 1:3.0.5+hg12629-1.2 uptodate
Honza
  • 499
  • 4
  • 12
  • Note that I had additional error when configuring the resources: crm_resource: symbol lookup error: /usr/lib/libpe_rules.so.2: undefined symbol: crm_strdup_fn ... maybe my problems include some libraries not upgraded properly ... – Honza May 28 '19 at 08:28

1 Answers1

0

Reading http://www.linux-ha.org/doc/users-guide/_upgrading_from_crm_enabled_heartbeat_2_1_clusters.html#_backing_up_the_cib it seems that the correct course of action will be to remove everything in /var/lib/heartbeat/crm EXCEPT /var/lib/heartbeat/crm/cib.xml

... not sure why the heartbeat/wheezy 3.0.5-3 would be like 2.1 but it does make sense ...

... hmmm no doesn't work.

Honza
  • 499
  • 4
  • 12