Physical Machine: 192.168.10.1 ( Mesos, Zookeeper, Marathon )
Virtual Machine: 192.168.122.10 ( Mesos, Zookeeper )
Virtual Machine: 192.168.122.46 ( Mesos, Zookeeper )
OS for all three machines are Fedora 23 Server
The two networks are already inter-routed by default as the virtual machines all reside on the physical machine.
There is no firewall setup.
Mesos Election LOG:
Master bound to loopback interface! Cannot communicate with remote schedulers or slaves. You might want to set '--ip' flag to a routable IP address.
I can set this manually, however I cannot set this dynamically... the --ip_discovery_command
flag is not recognized.
What I wanted to do was link the below script to that flag.
if [[ $(ip addr) == *enp8s0* ]];
then
ip addr show enp8s0 | awk -F'/| ' '/inet/ { print $6 }'
else
ip addr show eth0 | awk -F'/| ' '/inet/ { print $6 }'
fi
When I do set this manually (not what I want to do)...
the Mesos page at IP:5050
comes up... but then the mesos-master fails after 1 minute due to this...
F0427 17:03:27.975260 6914 master.cpp:1253] Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins
*** Check failure stack trace: ***
@ 0x7f8360fa9edd (unknown)
@ 0x7f8360fabc50 (unknown)
@ 0x7f8360fa9ad3 (unknown)
@ 0x7f8360fac61e (unknown)
@ 0x7f83619a85dd (unknown)
@ 0x7f83619e7c30 (unknown)
@ 0x55a885ee3b2e (unknown)
@ 0x7f8361a11c0e (unknown)
@ 0x7f8361a5d75e (unknown)
@ 0x7f8361a7077a (unknown)
@ 0x7f83618f4aae (unknown)
@ 0x7f8361a70768 (unknown)
@ 0x7f8361a548d0 (unknown)
@ 0x7f8361fc832c (unknown)
@ 0x7f8361fd42a5 (unknown)
@ 0x7f8361fd472f (unknown)
@ 0x7f8360a5e60a start_thread
@ 0x7f835fefda4d __clone Aborted (core dumped)
Zookeeper is setup like this:
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/log
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1:192.168.10.1:2888:3888
server.2:192.168.122.46:2888:3888
server.3:192.168.122.10:2888:3888
and have no idea how to verify that it is working properly...
I'm honestly on the end of my rope.. pulling out my hair for the past week on this due to poor documentation and lack of proper architecture explanations (primarily Marathon) horribly organized logs (Mesos), systemd being unable to properly parse a bash and use the output as a variable, and lack of instructions all around.
Am I doing something wrong? I Appreciate any assistance I can get, Let me know if you need anything I have not yet provided and I will post it right away.
EDIT:
I fixed the issue with marathon, by adding two additional Marathon servers to the VM's so that they could form a quorum.
EDIT2:
I am now having issues where the Mesos server keeps rapidly re-electing a leader... but depending on the outcome I will look into this later...