3

I've got a Keycloak standalone HA cluster running on a docker host. The cluster uses JDBC Ping to a PostgreSQL database for discovery (as this will eventually be running on ECS so no multicast).

The cluster discovery works well and each node will add themselves to the database on startup. However, they aren't removing themselves when stopped with "docker stop". This is fine as long as there is at least one other node up, as they will automatically detect the downed node and rebalance, but if the last one goes down, the final row will remain. Then when a new node goes up it will attempt to connect to the stale node and fail.

The JGroups TCP stack looks as follows

<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp">
    <property name="external_addr">
        ${env.EXTERNAL_ADDR}
    </property>
</transport>
<protocol type="org.jgroups.protocols.JDBC_PING">
    <property name="connection_driver">
        org.postgresql.Driver
    </property>
    <property name="connection_url">
        jdbc:postgresql://${env.DB_ADDR:postgres}:${env.DB_PORT:5432}/${env.DB_DATABASE:keycloak}
    </property>
    <property name="connection_username">
        ${env.DB_USER:keycloak}
    </property>
    <property name="connection_password">
        ${env.DB_PASSWORD:password}
    </property>
    <property name="initialize_sql">
        CREATE TABLE IF NOT EXISTS JGROUPSPING ( own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, ping_data bytea DEFAULT NULL, added timestamp DEFAULT NOW(), PRIMARY KEY (own_addr, cluster_name))
    </property>
</protocol>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
</stack>

Dockerfile is

FROM jboss/keycloak:latest

# elevate to install iproute
USER root
RUN yum install -y iproute

USER jboss

ADD cli/* /opt/jboss/keycloak/cli/
RUN cd /opt/jboss/keycloak \
  && bin/jboss-cli.sh --file=cli/setup.cli \
  && rm -rf /opt/jboss/keycloak/standalone/configuration/standalone_xml_history

RUN sed -i -e "/.*<\/dependencies>$/i \ \ \ \ \ \ \ \ <module 
name=\"org.postgresql.jdbc\"\/>" 
/opt/jboss/keycloak/modules/system/layers/base/org/jgroups/main/module.xml

ADD start.sh /opt/jboss/

ENTRYPOINT [ "/opt/jboss/start.sh" ]
CMD ["-b", "0.0.0.0", "--server-config", "standalone-ha.xml"]

EXPOSE 7600

And startup.sh contains

#!/bin/sh

DEFAULT_NIC=`ip route | grep default | awk '{print $NF}'`
export EXTERNAL_ADDR=`ip -f inet -o addr show $DEFAULT_NIC | cut -d" " -f 7 | cut -d/ -f 1`

if [ "$EXTERNAL_ADDR" = "" ]; then
    EXTERNAL_ADDR=127.0.0.1
fi


sh /opt/jboss/docker-entrypoint.sh $@ -Djgroups.bind_addr=$EXTERNAL_ADDR -Djboss.bind.address.private=$EXTERNAL_ADDR -Djboss.bind.address.management=$EXTERNAL_ADDR -Djgroups.bind.address=$EXTERNAL_ADDR -Djava.net.preferIPv4Stack=true -Dignore.bind.address=true

Can't really see a reason why this wouldn't be removing. Are there any obvious configuration errors I'm making here?

Bonnotbh
  • 515
  • 7
  • 24
  • Fail in what sense? I understand that when it tries to connect to existing node and this does not respond it could cause a delay in startup, but it shouldn't fail ultimately... – Radim Vansa Jun 15 '18 at 13:17
  • Have you tried enabling TRACE for org.jgroups package to see what's up? If it's easier for you, you can try debugging it by running it outside a container. – Galder Zamarreño Jun 15 '18 at 14:22
  • 1
    so it looks like server shutdown isn't getting called on "docker stop" - if an error happens during startup the row is removed successfully, and it works fine outside of a container. will have to investigate further. – Bonnotbh Jun 18 '18 at 13:49

1 Answers1

2

The issue here was running the docker-entrypoint.sh with sh instead of exec. Changing the line to

exec ./docker-entrypoint.sh $@ -Djgroups.bind_addr=$EXTERNAL_ADDR

solved the problem.

Bonnotbh
  • 515
  • 7
  • 24