I have one Mesos Master, and 2 Mesos Slaves,
The 2 slaves are located:
- 1 in my LAN (mesos-slave-on-LAN)
- 1 in a public Cloud (mesos-slave-on-WAN)
Master can be reached:
- On LAN at: 10.1.10.175
- On WAN at: 94.141.153.57
I can see the 2 Mesos Slaves registered in Mesos.
I try to execute two types of dummy tasks:
A simple Python command
{ "id": "my-first-app", "cmd": "python -m SimpleHTTPServer 8009", "cpus": 0.01, "mem": 20.0, "instances": 1, "acceptedResourceRoles": [ "slave_public", "*" ], "ports": [8009], "requirePorts": true }
A Docker container deployement (with Nginx inside)
{ "id": "nginx-test", "container": { "docker": { "image": "nginx", "network": "BRIDGE", "portMappings": [{ "containerPort": 80, "hostPort": 0, "servicePort": 80, "protocol": "tcp" }] }, "type": "DOCKER", "volumes": [] }, "healthChecks": [{ "protocol": "HTTP", "portIndex": 0, "path": "/", "gracePeriodSeconds": 5, "intervalSeconds": 20, "maxConsecutiveFailures": 3 }], "cpus": 0.2, "mem": 32.0, "instances": 1
}
Case 1: Mesos-Master advertising the LAN IP:
Mesos Master:
/usr/sbin/mesos-master --work_dir=/var/lib/mesos --zk=zk://mesos_machine:2181/mesos --quorum=1 --log_dir=/var/log/mesos --external_log_file=/dev/stdout --advertise_ip=10.1.10.175
Mesos Slave on LAN:
/usr/sbin/mesos-slave --master=10.1.10.175:5050 --work_dir=/var/lib/mesos/agent --containerizers=docker,mesos --executor_registration_timeout=3mins --log_dir=/var/log/mesos
Mesos Slave on WAN:
/usr/sbin/mesos-slave --master=94.141.153.57:5050 --work_dir=/var/lib/mesos/agent --containerizers=docker,mesos --executor_registration_timeout=3mins --log_dir=/var/log/mesos
I get this matrix when I run the above confs, and stop successfully one of the slaves:
| | LAN Slave | Cloud Slave |
|-------- |----------- |------------- |
| Python | 'Waiting' | 'Waiting' |
| Docker | 'RUNNING' | 'Waiting' |
Here the Nginx App has been deployed by Marathon on mesos-slave-on-LAN
Case 2: Mesos-Master advertising the WAN IP:
Mesos Master:
/usr/sbin/mesos-master --work_dir=/var/lib/mesos --zk=zk://mesos_machine:2181/mesos --quorum=1 --log_dir=/var/log/mesos --external_log_file=/dev/stdout --advertise_ip=94.141.153.57
Mesos Slave on LAN (using master LAN ip, otherwise is not added in Mesos):
/usr/sbin/mesos-slave --master=10.1.10.175:5050 --work_dir=/var/lib/mesos/agent --containerizers=docker,mesos --executor_registration_timeout=3mins --log_dir=/var/log/mesos --advertise_ip=10.1.10.20
Mesos Slave on WAN:
/usr/sbin/mesos-slave --master=94.141.153.57:5050 --work_dir=/var/lib/mesos/agent --containerizers=docker,mesos --executor_registration_timeout=3mins --log_dir=/var/log/mesos
I get this matrix when I run the above confs, and stop successfully one of the slaves:
| | LAN Slave | Cloud Slave |
|-------- |----------- |------------- |
| Python | 'Waiting' | 'Waiting' |
| Docker | 'Waiting' | 'Waiting' |
Here the Nginx App has not been deployed by Marathon on mesos-slave-on-LAN
However the 2 slaves are visible as resource in Mesos webui.
How can I be able to deploy 'Python' and 'Docker' container in a LAN as well as in a WAN slave?
Logs of Marathon are:
25596:[2016-12-13 15:26:52,393] INFO [/nginx-test-n2]: new app detected (mesosphere.marathon.upgrade.GroupVersioningUtil$:marathon-akka.actor.default-dispatcher-1064)
25600: * Start(App(/nginx-test-n2, image="nginx")), instances=0)
25602: * Scale(App(/nginx-test-n2, image="nginx")), instances=1)
25607: * Start(App(/nginx-test-n2, image="nginx")), instances=0)
25609: * Scale(App(/nginx-test-n2, image="nginx")), instances=1)
25611:[2016-12-13 15:26:52,400] INFO [/nginx-test-n2] storing new app version 2016-12-13T14:26:52.388Z (mesosphere.marathon.core.group.impl.GroupManagerActor:marathon-akka.actor.default-dispatcher-1028)
25612:[2016-12-13 15:26:52,417] INFO Adding health check for app [/nginx-test-n2] and version [2016-12-13T14:26:52.388Z]: [HealthCheck(Some(/),HTTP,Some(0),None,5 seconds,20 seconds,20 seconds,3,false,None)] (mesosphere.marathon.core.health.impl.MarathonHealthCheckManager:marathon-akka.actor.default-dispatcher-1073)
25613:[2016-12-13 15:26:52,417] INFO Starting app /nginx-test-n2 (mesosphere.marathon.SchedulerActions:marathon-akka.actor.default-dispatcher-1073)
25614:[2016-12-13 15:26:52,417] INFO Starting health check actor for app [/nginx-test-n2] version [2016-12-13T14:26:52.388Z] and healthCheck [HealthCheck(Some(/),HTTP,Some(0),None,5 seconds,20 seconds,20 seconds,3,false,None)] (mesosphere.marathon.core.health.impl.HealthCheckActor:marathon-akka.actor.default-dispatcher-1075)
25615:[2016-12-13 15:26:52,417] INFO Already running 0 instances of /nginx-test-n2. Not scaling. (mesosphere.marathon.SchedulerActions:marathon-akka.actor.default-dispatcher-1073)
25616:[2016-12-13 15:26:52,418] INFO Successfully started 0 instances of /nginx-test-n2 (mesosphere.marathon.upgrade.AppStartActor:marathon-akka.actor.default-dispatcher-1073)
25617:[2016-12-13 15:26:52,418] INFO Started taskLaunchActor for /nginx-test-n2 version 2016-12-13T14:26:52.388Z with initial count 1 (mesosphere.marathon.core.launchqueue.impl.TaskLauncherActor:marathon-akka.actor.default-dispatcher-1028)
25618:[2016-12-13 15:26:52,419] INFO activating matcher ActorOfferMatcher(Actor[akka://marathon/user/launchQueue/5/6-nginx-test-n2#-69306758]). (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-1071)
25627:[2016-12-13 15:26:52,425] INFO Offer [bd40f00f-ce24-4014-b1b1-82db64e68c10-O91]. Considering resources with roles {*} without resident reservation labels. Insufficient ports in offer for run spec [/nginx-test-n2] (mesosphere.marathon.tasks.PortsMatcher:marathon-akka.actor.default-dispatcher-1073)
25628:[2016-12-13 15:26:52,425] INFO Offer [bd40f00f-ce24-4014-b1b1-82db64e68c10-O91]. Insufficient resources for [/nginx-test-n2] (need cpus=0.2, mem=32.0, disk=0.0, gpus=0, ports=(), available in offer: [id { value: "bd40f00f-ce24-4014-b1b1-82db64e68c10-O91" } framework_id { value: "40aadcc7-8e0f-4634-af46-29d9c33bc03e-0000" } slave_id { value: "bd40f00f-ce24-4014-b1b1-82db64e68c10-S1" } hostname: "myproj-slave-vm-1" resources { name: "disk" type: SCALAR scalar { value: 3985.0 } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 0.6 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 6478.0 } role: "*" } url { scheme: "http" address { hostname: "myproj-slave-vm-1" ip: "10.1.10.20" port: 5051 } path: "/slave(1)" }] (mesosphere.mesos.TaskBuilder$:marathon-akka.actor.default-dispatcher-1073)
25634: * Start(App(/nginx-test-n2, image="nginx")), instances=0)
25636: * Scale(App(/nginx-test-n2, image="nginx")), instances=1)
25646:[2016-12-13 15:26:57,440] INFO Offer [bd40f00f-ce24-4014-b1b1-82db64e68c10-O92]. Considering resources with roles {*} without resident reservation labels. Insufficient ports in offer for run spec [/nginx-test-n2] (mesosphere.marathon.tasks.PortsMatcher:marathon-akka.actor.default-dispatcher-1073)
25647:[2016-12-13 15:26:57,440] INFO Offer [bd40f00f-ce24-4014-b1b1-82db64e68c10-O92]. Insufficient resources for [/nginx-test-n2] (need cpus=0.2, mem=32.0, disk=0.0, gpus=0, ports=(), available in offer: [id { value: "bd40f00f-ce24-4014-b1b1-82db64e68c10-O92" } framework_id { value: "40aadcc7-8e0f-4634-af46-29d9c33bc03e-0000" } slave_id { value: "bd40f00f-ce24-4014-b1b1-82db64e68c10-S1" } hostname: "myproj-slave-vm-1" resources { name: "disk" type: SCALAR scalar { value: 3985.0 } role: "*" } resources { name: "cpus" type: SCALAR scalar { value: 0.6 } role: "*" } resources { name: "mem" type: SCALAR scalar { value: 6478.0 } role: "*" } url { scheme: "http" address { hostname: "myproj-slave-vm-1" ip: "10.1.10.20" port: 5051 } path: "/slave(1)" }] (mesosphere.mesos.TaskBuilder$:marathon-akka.actor.default-dispatcher-1073)
25660:[2016-12-13 15:27:02,457] INFO Offer [bd40f00f-ce24-4014-b1b1-82db64e68c10-O93]. Considering resources with roles {*} without resident reservation labels. Insufficient ports in offer for run spec [/nginx-test-n2] (mesosphere.marathon.tasks.PortsMatcher:marathon-akka.actor.default-dispatcher-1028)
25661:[2016-12-13 15:27:02,457] INFO Offer [bd40f00f-ce2
I have also this kind of log, with no deployement and no error:
31333:[2016-12-13 16:00:39,975] INFO [/nginx-test10]: new app detected (mesosphere.marathon.upgrade.GroupVersioningUtil$:marathon-akka.actor.default-dispatcher-1139)
31337: * Start(App(/nginx-test10, image="nginx")), instances=0)
31339: * Scale(App(/nginx-test10, image="nginx")), instances=1)
31344: * Start(App(/nginx-test10, image="nginx")), instances=0)
31346: * Scale(App(/nginx-test10, image="nginx")), instances=1)
31348:[2016-12-13 16:00:39,979] INFO [/nginx-test10] storing new app version 2016-12-13T15:00:39.974Z (mesosphere.marathon.core.group.impl.GroupManagerActor:marathon-akka.actor.default-dispatcher-1101)
31349:[2016-12-13 16:00:39,981] INFO Adding health check for app [/nginx-test10] and version [2016-12-13T15:00:39.974Z]: [HealthCheck(Some(/),HTTP,Some(0),None,5 seconds,20 seconds,20 seconds,3,false,None)] (mesosphere.marathon.core.health.impl.MarathonHealthCheckManager:marathon-akka.actor.default-dispatcher-1141)
31350:[2016-12-13 16:00:39,982] INFO Starting app /nginx-test10 (mesosphere.marathon.SchedulerActions:marathon-akka.actor.default-dispatcher-1141)
31351:[2016-12-13 16:00:39,982] INFO Starting health check actor for app [/nginx-test10] version [2016-12-13T15:00:39.974Z] and healthCheck [HealthCheck(Some(/),HTTP,Some(0),None,5 seconds,20 seconds,20 seconds,3,false,None)] (mesosphere.marathon.core.health.impl.HealthCheckActor:marathon-akka.actor.default-dispatcher-1139)
31352:[2016-12-13 16:00:39,982] INFO Already running 0 instances of /nginx-test10. Not scaling. (mesosphere.marathon.SchedulerActions:marathon-akka.actor.default-dispatcher-1141)
31353:[2016-12-13 16:00:39,982] INFO Successfully started 0 instances of /nginx-test10 (mesosphere.marathon.upgrade.AppStartActor:marathon-akka.actor.default-dispatcher-1141)
31354:[2016-12-13 16:00:39,983] INFO Started taskLaunchActor for /nginx-test10 version 2016-12-13T15:00:39.974Z with initial count 1 (mesosphere.marathon.core.launchqueue.impl.TaskLauncherActor:marathon-akka.actor.default-dispatcher-1110)
31355:[2016-12-13 16:00:39,983] INFO activating matcher ActorOfferMatcher(Actor[akka://marathon/user/launchQueue/6/2-nginx-test10#1135134700]). (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-1140)
31364: * Start(App(/nginx-test10, image="nginx")), instances=0)
31366: * Scale(App(/nginx-test10, image="nginx")), instances=1)
The resources in Mesos appear like this (only the WAN slave appears, the LAN slave not, i dont understand why):
| | CPUs | GPUs | Mem | Disk |
|---------|------|------|--------|---------|
| Total | 1 | 0 | 244 MB | 43.6 GB |
| Used | 0 | 0 | 0 B | 0 B |
| Offered | 0 | 0 | 0 B | 0 B |
| Idle | 1 | 0 | 244 MB | 43.6 GB |