1

Hi I have two file servers, let's call them g1 @ 10.13.13.201, and g2 @ 10.13.13.202. I've been able to successfully merge both into a glusterfs volume being mounted on 10.13.13.201:/mnt/glusterfs. So technically, I have one box which is solely a glusterd server, and another which is both a server and a client. I chose to do it this way because each file-server is 24 drives, only one drive is OS, the rest are LSOD in a raidz2 zfs array. So I figured, why need a separate controller, when one of the machines had enough beef to assume those responsibilities itself.

Up until this point this setup has worked fine, however I'm running into some issues with getting SSL/TLS to work with this configuration. So starting from scratch, I generate the zfs pools, and install the glusterfs server software. Before configuring any gluster peers, I run the following script to generate the certs and CA on both boxes:

#!/bin/bash

#temp user directory for generation of keys
mkdir ~/temp_ssl
cd ~/temp_ssl

#generating self-signed keys
openssl genrsa -out $HOSTNAME.key 2048
openssl req -new -x509 -key "$HOSTNAME".key -subj "/CN=$HOSTNAME" -out "$HOSTNAME".pem

#grab both keys
sshpass -p 1 scp user@10.13.13.201:~/temp_ssl/g1.key . 
sshpass -p 1 scp user@10.13.13.202:~/temp_ssl/g2.key . 
#concatenate both keys to generate CA
for f in *key; do
 cat $f >> gluster.ca;
done;
#cp CA and key and CA to /etc/ssl, change ownership and access priveledges to only root read / write.
sudo cp $HOSTNAME* gluster.ca /etc/ssl
sudo chown root:root /etc/ssl/$HOSTNAME* gluster.ca
sudo chmod 0600 /etc/ssl/$HOSTNAME* gluster.ca

#remove the unsecured keys
cd 
sudo rm -rf temp_ssl

#generate file flag for ssl secured maintenance between glusters
sudo touch /var/lib/glusterd/secure-access

#restart glusterd
sudo system systemctl restart glusterfs-server.service

exit 0

However, touching the secure-access file into the glusterd maintenance path breaks the server:

$ sudo systemctl restart glusterfs-server.service
Job for glusterfs-server.service failed because the control process exited with error code. See "systemctl status glusterfs-server.service" and "journalctl -xe" for details.
$ sudo systemctl status glusterfs-server.service
glusterfs-server.service - LSB: GlusterFS server
   Loaded: loaded (/etc/init.d/glusterfs-server; bad; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2017-03-15 18:50:17 CDT; 1min 0s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 6482 ExecStop=/etc/init.d/glusterfs-server stop (code=exited, status=0/SUCCESS)
  Process: 6526 ExecStart=/etc/init.d/glusterfs-server start (code=exited, status=1/FAILURE)

Mar 15 18:50:17 g1 systemd[1]: Starting LSB: GlusterFS server...
Mar 15 18:50:17 g1 glusterfs-server[6526]:  * Starting glusterd service glusterd
Mar 15 18:50:17 g1 glusterfs-server[6526]:    ...fail!
Mar 15 18:50:17 g1 systemd[1]: glusterfs-server.service: Control process exited, code=exited status=1
Mar 15 18:50:17 g1 systemd[1]: Failed to start LSB: GlusterFS server.
Mar 15 18:50:17 g1 systemd[1]: glusterfs-server.service: Unit entered failed state.
Mar 15 18:50:17 g1 systemd[1]: glusterfs-server.service: Failed with result 'exit-code'.

When I remove it everything starts ok:

$ sudo rm -rf secure-access 
$ sudo systemctl restart glusterfs-server.service
$ sudo systemctl status glusterfs-server.service
● glusterfs-server.service - LSB: GlusterFS server
   Loaded: loaded (/etc/init.d/glusterfs-server; bad; vendor preset: enabled)
   Active: active (running) since Wed 2017-03-15 18:53:15 CDT; 2s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 6482 ExecStop=/etc/init.d/glusterfs-server stop (code=exited, status=0/SUCCESS)
  Process: 6552 ExecStart=/etc/init.d/glusterfs-server start (code=exited, status=0/SUCCESS)
    Tasks: 7
   Memory: 12.8M
      CPU: 2.306s
   CGroup: /system.slice/glusterfs-server.service
           └─6560 /usr/sbin/glusterd -p /var/run/glusterd.pid

Mar 15 18:53:13 g1 systemd[1]: Starting LSB: GlusterFS server...
Mar 15 18:53:13 g1 glusterfs-server[6552]:  * Starting glusterd service glusterd
Mar 15 18:53:15 g1 glusterfs-server[6552]:    ...done.
Mar 15 18:53:15 g1 systemd[1]: Started LSB: GlusterFS server.

I have a feeling the issue is stemming from the fact that the CAs are identical on both the server and client. As I've read in documentation, the certs from the servers and client are concatenated and distributed to the servers, whereas the client only receives the concatenated certs from the servers. Currently, the client is using a CA with both it's own certificate and that of the second server. So maybe this is the issue. But I'm somewhat doubtful, because even restarting the glusterd service on the servers fails for the same reason, and in those instances the CAs should be ok.

Also, would it be feasible for me to work around this by using a ssh tunnel for all traffic on the glusterd ports? So in this instance, I have 4 ports for gluster open on the boxes, plus ssh/22 on the client:

sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.201 --dport 24007:24008 -j ACCEPT
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.202 --dport 24007:24008 -j ACCEPT
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.201 --dport 49152:49153 -j ACCEPT
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp -s 10.13.13.202 --dport 49152:49153 -j ACCEPT

How would I go about wrapping all this cross talk on ports 49152-3, and 24007-8 over a ssh tunnel?

Thoughts on whats going on here? Marty

1 Answers1

0

Found the bug!

for f in *key; do
 cat $f >> gluster.ca;
done;

I should have been concatenating the PEM insteads.

This is troublesome for me specifically with regards to this community. Clearly, this was a rubber ducky type of situation, where I had spent too much time with the code and couldn't clearly see the error. I ended up spending hours on a simple error, and it probably would have been very helpful to have atleast 1 additional set of eyes to catch the problem. That's partially the point of this forum, correct? Honestly, if no one on this forum could figure this out and / or just refused to help, then what's the point of even coming here?

Maybe I just needed to vent. Someone took my stress ball the other day.