Hi I have two file servers, let's call them g1 @ 10.13.13.201, and g2 @ 10.13.13.202. I've been able to successfully merge both into a glusterfs volume being mounted on 10.13.13.201:/mnt/glusterfs. So technically, I have one box which is solely a glusterd server, and another which is both a server and a client. I chose to do it this way because each file-server is 24 drives, only one drive is OS, the rest are LSOD in a raidz2 zfs array. So I figured, why need a separate controller, when one of the machines had enough beef to assume those responsibilities itself.
Up until this point this setup has worked fine, however I'm running into some issues with getting SSL/TLS to work with this configuration. So starting from scratch, I generate the zfs pools, and install the glusterfs server software. Before configuring any gluster peers, I run the following script to generate the certs and CA on both boxes:
#!/bin/bash
#temp user directory for generation of keys
mkdir ~/temp_ssl
cd ~/temp_ssl
#generating self-signed keys
openssl genrsa -out $HOSTNAME.key 2048
openssl req -new -x509 -key "$HOSTNAME".key -subj "/CN=$HOSTNAME" -out "$HOSTNAME".pem
#grab both keys
sshpass -p 1 scp user@10.13.13.201:~/temp_ssl/g1.key .
sshpass -p 1 scp user@10.13.13.202:~/temp_ssl/g2.key .
#concatenate both keys to generate CA
for f in *key; do
cat $f >> gluster.ca;
done;
#cp CA and key and CA to /etc/ssl, change ownership and access priveledges to only root read / write.
sudo cp $HOSTNAME* gluster.ca /etc/ssl
sudo chown root:root /etc/ssl/$HOSTNAME* gluster.ca
sudo chmod 0600 /etc/ssl/$HOSTNAME* gluster.ca
#remove the unsecured keys
cd
sudo rm -rf temp_ssl
#generate file flag for ssl secured maintenance between glusters
sudo touch /var/lib/glusterd/secure-access
#restart glusterd
sudo system systemctl restart glusterfs-server.service
exit 0
However, touching the secure-access file into the glusterd maintenance path breaks the server:
$ sudo systemctl restart glusterfs-server.service
Job for glusterfs-server.service failed because the control process exited with error code. See "systemctl status glusterfs-server.service" and "journalctl -xe" for details.
$ sudo systemctl status glusterfs-server.service
glusterfs-server.service - LSB: GlusterFS server
Loaded: loaded (/etc/init.d/glusterfs-server; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2017-03-15 18:50:17 CDT; 1min 0s ago
Docs: man:systemd-sysv-generator(8)
Process: 6482 ExecStop=/etc/init.d/glusterfs-server stop (code=exited, status=0/SUCCESS)
Process: 6526 ExecStart=/etc/init.d/glusterfs-server start (code=exited, status=1/FAILURE)
Mar 15 18:50:17 g1 systemd[1]: Starting LSB: GlusterFS server...
Mar 15 18:50:17 g1 glusterfs-server[6526]: * Starting glusterd service glusterd
Mar 15 18:50:17 g1 glusterfs-server[6526]: ...fail!
Mar 15 18:50:17 g1 systemd[1]: glusterfs-server.service: Control process exited, code=exited status=1
Mar 15 18:50:17 g1 systemd[1]: Failed to start LSB: GlusterFS server.
Mar 15 18:50:17 g1 systemd[1]: glusterfs-server.service: Unit entered failed state.
Mar 15 18:50:17 g1 systemd[1]: glusterfs-server.service: Failed with result 'exit-code'.
When I remove it everything starts ok:
$ sudo rm -rf secure-access
$ sudo systemctl restart glusterfs-server.service
$ sudo systemctl status glusterfs-server.service
● glusterfs-server.service - LSB: GlusterFS server
Loaded: loaded (/etc/init.d/glusterfs-server; bad; vendor preset: enabled)
Active: active (running) since Wed 2017-03-15 18:53:15 CDT; 2s ago
Docs: man:systemd-sysv-generator(8)
Process: 6482 ExecStop=/etc/init.d/glusterfs-server stop (code=exited, status=0/SUCCESS)
Process: 6552 ExecStart=/etc/init.d/glusterfs-server start (code=exited, status=0/SUCCESS)
Tasks: 7
Memory: 12.8M
CPU: 2.306s
CGroup: /system.slice/glusterfs-server.service
└─6560 /usr/sbin/glusterd -p /var/run/glusterd.pid
Mar 15 18:53:13 g1 systemd[1]: Starting LSB: GlusterFS server...
Mar 15 18:53:13 g1 glusterfs-server[6552]: * Starting glusterd service glusterd
Mar 15 18:53:15 g1 glusterfs-server[6552]: ...done.
Mar 15 18:53:15 g1 systemd[1]: Started LSB: GlusterFS server.
I have a feeling the issue is stemming from the fact that the CAs are identical on both the server and client. As I've read in documentation, the certs from the servers and client are concatenated and distributed to the servers, whereas the client only receives the concatenated certs from the servers. Currently, the client is using a CA with both it's own certificate and that of the second server. So maybe this is the issue. But I'm somewhat doubtful, because even restarting the glusterd service on the servers fails for the same reason, and in those instances the CAs should be ok.
Also, would it be feasible for me to work around this by using a ssh tunnel for all traffic on the glusterd ports? So in this instance, I have 4 ports for gluster open on the boxes, plus ssh/22 on the client:
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.201 --dport 24007:24008 -j ACCEPT
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.202 --dport 24007:24008 -j ACCEPT
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.201 --dport 49152:49153 -j ACCEPT
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.202 --dport 49152:49153 -j ACCEPT
How would I go about wrapping all this cross talk on ports 49152-3, and 24007-8 over a ssh tunnel?
Thoughts on whats going on here? Marty