1

My node log is flooded with warn message of "Dropping unicast message to wrong destination" when one of the node in the cluster is restarted . we are using Jgroups, TCP, version of jgroups-3.4.1.Final. My server does not come up , with these warning messages contagiously thrown up

Below are the warning messages [0;33mWARN [TransferQueueBundler,h-broadcast,h-13] [TCP] JGRP000032: h-13: no physical address for 8281f201-7fb1-f6ac-faf3-d6837bc39087, dropping message

[0;33mWARN [INT-1,h-broadcast,h-13] [TCP] JGRP000031: h-13: dropping unicast message to wrong destination d205fcba-151c-ad58-8323-fe4f49117f88

Please let me know how to resolve for this issue

Thanks, Nivedita

<TCP loopback="true" 
    recv_buf_size="${tcp.recv_buf_size:20M}" 
    send_buf_size="${tcp.send_buf_size:640K}"
    discard_incompatible_packets="true" 
    max_bundle_size="64K" 
    max_bundle_timeout="5" 
    enable_bundling="true" 
    use_send_queues="true"
    sock_conn_timeout="300" 
    timer_type="new" 
    timer.min_threads="4" 
    timer.max_threads="10" 
    timer.keep_alive_time="3000"
    timer.queue_max_size="500" 
    thread_pool.enabled="true" 
    thread_pool.min_threads="4" 
    thread_pool.max_threads="10"
    thread_pool.keep_alive_time="5000" 
    thread_pool.queue_enabled="true" 
    thread_pool.queue_max_size="100000"
    thread_pool.rejection_policy="discard" 
    oob_thread_pool.enabled="true" 
    oob_thread_pool.min_threads="1"
    oob_thread_pool.max_threads="8" 
    oob_thread_pool.keep_alive_time="5000" 
    oob_thread_pool.queue_enabled="false"
    oob_thread_pool.queue_max_size="100" 
    oob_thread_pool.rejection_policy="discard" 
    bind_addr="${hybris.jgroups.bind_addr}" 
    bind_port="${hybris.jgroups.bind_port}" />
<TCPPING timeout="3000" 
    initial_hosts="xxx.xx.xx.4[7800],xxx.xx.xx.5[7800],xxx.xx.xx.6[7800], xxx.xx.xx.7[7800], xxx.xx.xx.8[7800], xxx.xx.xx.9[7800], xxx.xx.xx.10[7800], xxx.xx.xx.11[7800], xxx.xx.xx.12[7800], xxx.xx.xx.13[7800], xxx.xx.xx.68[7800], xxx.xx.xx.69[7800], xxx.xx.xx.70[7800], xxx.xx.xx.4[7800], xxx.xx.xx.5[7800], xxx.xx.xx.6[7800]" 
    num_initial_members="16"/>

<MERGE2 min_interval="10000" max_interval="30000" />
    <FD_SOCK />
    <FD timeout="3000" max_tries="3" />
    <VERIFY_SUSPECT timeout="1500" />
    <BARRIER />
    <pbcast.NAKACK use_mcast_xmit="false" exponential_backoff="500" discard_delivered_msgs="true" />
    <UNICAST2 />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000" max_bytes="4M" />
    <pbcast.GMS print_local_addr="true" join_timeout="3000" view_bundling="true" />
    <UFC max_credits="20M" min_threshold="0.4" />
    <MFC max_credits="20M" min_threshold="0.4" />
    <FRAG2 frag_size="60K" />
    <pbcast.STATE_TRANSFER />
Nivedita Dixit
  • 181
  • 1
  • 3
  • 12

3 Answers3

4

Thanks a lot for the suggestions. The cluster nodes self healed itself, when one of the problematic node was made down ( it was unable to telnet in comparison to other nodes which were able to telnet)

Nivedita Dixit
  • 181
  • 1
  • 3
  • 12
  • i can't get this answer clearly. iam getting same issue! how should i configure in jgroups cluster? – Nandhakumar Kittusamy Aug 17 '17 at 14:33
  • 1
    Among the nodes in the cluster , one of the node has network issue, we were unable to telnet on port 7800 on it. When that faulty node was removed from the cluster, the nodes self healed and joined the cluster – Nivedita Dixit Aug 19 '17 at 06:40
  • Anyway, Thanks for your solution! In my case i can connect the node using telnet, but it can't join into the cluster. I don't know where is the problem. – Nandhakumar Kittusamy Aug 19 '17 at 06:50
2

I assume you're using TCP:TCPPING? Do you list all members in TCPPING.initial_hosts? This is the most likely cause for the warnings above.

There's a cache mapping UUIDs (JGroups' internal representation of cluster members) to physical addresses, in every member.

You can look at the contents either via JMX or probe.sh uuids. There should be a mapping in h13 for 8281f201-7fb1-f6ac-faf3-d6837bc39087, but it's missing. Again, most likely because h13 isn't listed in TCPPING.

You could try an alternative discovery protocol (e.g. MPING if IP multicasting is supported, FILE_PING which requires a shared file system, TCPGOSSIP with an external lookup service etc). Check the manual for details.

Bela Ban
  • 2,186
  • 13
  • 12
  • Yes, we have listed all the hosts in initial_hosts attribute in the jgroups-tcp.xml. we are running on azure cloud which does not support multicast , hence we cannot use multicast . . Please find below the jgroups-tcp.xml configuration – Nivedita Dixit Apr 29 '16 at 07:05
  • Pasted the jgroups-tcp.xml configuration in the question – Nivedita Dixit Apr 29 '16 at 07:16
  • Tried with the Probe command , but nothing is returned . If Probe uses multicast, I think it will not work me. Can you please help if I can use any alternate mechanism – Nivedita Dixit Apr 29 '16 at 07:38
  • `probe.sh -add x.x.x.4` connects to one of the nodes, grabs the addresses of the others and then connects to all members. – Bela Ban May 02 '16 at 05:53
1

I encountered this issue this week and was tweaking firewall to make the JGroups working for days, until today, I switched the JGroup Stack from "UDP" to "TCP", suddenly all issues gone.

Wey Gu
  • 575
  • 1
  • 6
  • 11