2

I have a 1/1 master/slave setup with the slave having 8gb ram 8 cpus. I am trying to use marathon to deploy a docker container with 1gb mem and 1 cpu but it just hangs on waiting

I believe this is usually caused by marathon not getting the resources it wants for the task when I look at my logs I see

Sending 1 offers to framework 8bb1a298-cc23-426e-ad43-d440a2a560c4-0000 (marathon) at scheduler-d4a993b4-69ea-4ac3-9e98-b54afe1e790b@127.0.0.1:52016 I0127 23:07:37.396546 2471 master.cpp:3297] Processing DECLINE call for offers: [ 5271fcb3-4d77-4b12-af85-d94fd9172514-O127 ] for framework 8bb1a298-cc23-426e-ad43-d440a2a560c4-0000 (marathon) at scheduler-d4a993b4-69ea-4ac3-9e98-b54afe1e790b@127.0.0.1:52016 I0127 23:07:37.396917 2466 hierarchical.cpp:744] Recovered cpus(​):6; mem(​):5968; disk(​):156020; ports(​):[31000-31056, 31058-32000] (total: cpus(​):8; mem(​):6992; disk(​):156020; ports(​):[31000-32000], allocated: cpus(​):2; mem(​):1024; ports(*):[31057-31057]) on slave 8bb1a298-cc23-426e-ad43-d440a2a560c4-S0 from framework 8bb1a298-cc23-426e-ad43-d440a2a560c4-0000

so it looks like marathon is declining the offer it gets? the next line in the logs say that mesos is reclaiming the offered resources and what its reclaiming looks like its plenty for my task?

any ideas on how to trouble shoot this further?

edit: so got to dig into this a bit further and found the marathon logs.

Basically the deployment works if we do not enter any information for port mapping in the marathon docker section. The docker container deploys successfully and I can ping it successfully from its host but I cannot access it from elsewhere.

if we set the container port as 8081 (which is what the docker container exposes are its application listens on) we get further in the deployment process but the app within the container fails to build with error

Error: listen EADDRINUSE :::8081 at Object.exports._errnoException (util.js:856:11) at exports._exceptionWithHostPort (util.js:879:20) at Server._listen2 (net.js:1234:14) at listen (net.js:1270:10) at Server.listen (net.js:1366:5) at EventEmitter.listen (/usr/src/app/node_modules/express/lib/application.js:617:24) at Object. (/usr/src/app/index.js:16:18) at Module._compile (module.js:425:26) at Object.Module._extensions..js (module.js:432:10) at Module.load (module.js:356:32) at Function.Module._load (module.js:313:12) at Function.Module.runMain (module.js:457:10) at startup (node.js:138:18) at node.js:974:3

So I think we are further along than we were but we are still having some port issues. I dont know why the container would build successfully on its own and with marathon with no port settings but not with marathon with port settings

Mark
  • 3,137
  • 4
  • 39
  • 76
  • 1
    Can you post Marathon logs? – janisz Jan 28 '16 at 10:36
  • 1
    Have you enabled the Docker containerizer in your slave config? – Tobi Jan 28 '16 at 15:08
  • Thanks for the replies I will be the marathon logs later today. Yeah we initially couldn't launch any docker containers and then we enabled the correct cintaineriser and the busybot container launched but not our custom one which needed a little more resources but still well less than should be available – Mark Jan 28 '16 at 16:25
  • Can't find the marathon logs unfortunately, trying to hunt down where I should be looking in the docs – Mark Jan 28 '16 at 17:31
  • 1
    Take a look at `/var/log/syslog` or `/var/log/messages`. More details you can find [here](https://open.mesosphere.com/advanced-course/troubleshooting/) – janisz Jan 29 '16 at 12:03
  • added some more info, thanks all – Mark Feb 02 '16 at 18:19

1 Answers1

3

There are few things to check:

  1. On you slave: ps aux | grep sbin/mesos-slave should contain something like:

    --containerizers=docker,mesos --executor_registration_timeout=5mins

  2. Again on slave check that there's a Docker Daemon running:

    ps aux | grep "docker daemon"

  3. Make sure you've configured Docker network (in Marathon) as BRIDGE. With HOST mode you might get in collision with ports already used on host. This will allow mapping slave:32001 -> docker:8080.

    ...
    "network": "BRIDGE",
    "portMappings": [
      {
        "containerPort": 8080,
        "hostPort": $PORT0,
        "protocol": "tcp"
      }
    ],
    ...
    
  4. When the task starts in Marathon you'll see the app ID like myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf. Use Mesos CLI (pip install mesos.cli mesos.interface) to fetch the logs. There's a command similar to Unix's tail for fetching stdout logs (-f follow logs):

    mesos tail -f -i myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf
    

    and stderr:

    mesos tail -f -i myapp.a72db5b0-ca16-11e5-ba5f-fea9945fabaf stderr
    

    -i allows you to get logs from inactive tasks (in case that the task is crashing quickly). If you don't catch the ID in Marathon, use mesos ps -i.

  5. In case that the task is not starting, there's either not enough resources or some problem with Marathon. Navigate your browser to http://{marathon URI:8080]/logging and increase verbosity for task allocation. Then check Marathon logs.

Tombart
  • 30,520
  • 16
  • 123
  • 136