5

I'm trying to use the persistent volumes support for Mesos, and am having a tremendously difficult time getting it to work.

I've configured each of my slaves, as follows, and have confirmed that they've successfully rebooted using this new config:

/etc/mesos-slave/resources

[    ​
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk1" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk2" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk3" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk4" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "PATH",
        "path" : { "root" : "/mnt/disk5" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "MOUNT",
        "mount" : { "root" : "/mnt/disk6" }
      }
    }
  },
  {
    "name" : "disk",
    "type" : "SCALAR",
    "scalar" : { "value" : 4194304 },
    "disk" : {
      "source" : {
        "type" : "MOUNT",
        "mount" : { "root" : "/mnt/disk7" }
      }
    }
  }
]

It shows, specifically, that I have unreserved resources. Specifically (full response here):

{
  ...
  "slaves": [{
    "id": "c5e59876-5157-463f-b31e-16b34d6ffc72-S8",
    "pid": "slave(1)@172.30.31.55:5051",
    "hostname": "redacted47.redacted.com",
    "registered_time": 1458810586.61153,
    "resources": {
      "cpus": 32,
      "disk": 29360128,
      "mem": 256651,
      "ports": "[31000-32000]"
    },
    "used_resources": {
      "cpus": 1,
      "disk": 0,
      "mem": 128,
      "ports": "[31282-31282]"
    },
    "offered_resources": {
      "cpus": 0,
      "disk": 0,
      "mem": 0
    },
    "reserved_resources": {},
    "unreserved_resources": {
      "cpus": 32,
      "disk": 29360128,
      "mem": 256651,
      "ports": "[31000-32000]"
    },

Whenever I try to submit a job to it that requests a persistent volume, all of the slaves reject it, claiming that there are no disk resource available:

Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9375]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9376]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Finished processing 2220b6bf-aac2-402b-82e6-8d625284d1a4-O9375. Matched 0 ops after 1 passes. disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; disk(*) 4194304.0; cpus(*) 28.0; mem(*) 226955.0; ports(*) 31000->31085,31087->31364,31366->31940,31942->32000 left. (mesosphere.marathon.core.matcher.manager.impl.OfferMatcherManagerActor:marathon-akka.actor.default-dispatcher-11)
Mar 26 17:59:43 redacted47.redacted.com start[30457]: [2016-03-26 17:59:43,606] INFO Offer [2220b6bf-aac2-402b-82e6-8d625284d1a4-O9379]. Considering unreserved resources with roles {*}. Not all basic resources satisfied: cpus SATISFIED (1.0 <= 1.0), mem SATISFIED (128.0 <= 128.0), disk including volumes NOT SATISFIED (1024.0 > 0.0) (mesosphere.mesos.ResourceMatcher$:marathon-akka.actor.default-dispatcher-38)

If I try to post a request to create a volume directly against the mesos master, then it rejects the request, saying "Insufficient disk resources", as follows:

# curl -v -i \
    -u "marathon:$(cat /etc/marathon/.secret)" \
    -d slaveId=c5e59876-5157-463f-b31e-16b34d6ffc72-S8 \
    -d volumes='[
      {
        "name": "disk",
        "type": "SCALAR",
        "scalar": { "value": 512 },
        "role": "foo",
        "reservation": {
          "principal": "marathon"
        },
        "disk": {
          "persistence": {
            "id" : "very-persist"
          },
          "volume": {
            "mode": "RW",
            "container_path": "such-path"
          }
        }
      }
    ]' \
    -X POST http://localhost:5050/master/create-volumes; echo
* About to connect() to localhost port 5050 (#0)
*   Trying ::1...
* Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 5050 (#0)
* Server auth using Basic with user 'marathon'
> POST /master/create-volumes HTTP/1.1
> Authorization: Basic redacted
> User-Agent: curl/7.29.0
> Host: localhost:5050
> Accept: */*
> Content-Length: 481
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 481 out of 481 bytes
< HTTP/1.1 409 Conflict
HTTP/1.1 409 Conflict
< Date: Thu, 24 Mar 2016 09:50:36 GMT
Date: Thu, 24 Mar 2016 09:50:36 GMT
< Content-Length: 53
Content-Length: 53
​
<
* Connection #0 to host localhost left intact
Invalid CREATE Operation: Insufficient disk resources

I'm at wits end. I don't know what I'm doing and I'm trying my best to follow the documentation. Any hint as to what I might be doing wrong would be greatly, tremendously appreciated.

I'm running:

  • Mesos 0.28.0
  • Marathon 1.0.0RC1

I'm following the instructions from the following resources, as best as I can:

Thank you for reading!

Tim Harper
  • 2,561
  • 20
  • 23

2 Answers2

2

First thank you for providing such a nicely documented issue!

Your problem here seems to be the following:

a) There is no root disk resource available. Once you specify a disk resource manually as you did Mesos will stop detecting the root disk automatically. You could simply add a root disk resource as described here which should solve your problem.

b) Your "Create Volume" http request above will only consider root disk resources (which you don't have for the reason explained above). If you want to use the non-root disk, you should consider the source field as very briefly mentioned here.

BTW any feedback on how the documentation can be improved is welcome (I will add a short note about this issue, but any feedback from users is very helpful)! Feel free to contribute here!

Hope this was helpful!

Till
  • 27,559
  • 13
  • 88
  • 122
js84
  • 3,676
  • 2
  • 19
  • 23
  • Wonderful! Thank you! I got the call to succeed. And, now I understand better what it means that persistent volumes are created with reserved resources, either dynamically allocated, or pre-reserved via /etc/mesos-slave/resources. It appears that marathon offers no support for non-root volumes; the persistent volumes. The data structure describing a persistent volume simply has no room to specify Disk type. https://github.com/mesosphere/marathon/blob/v1.0.0-RC1/src/main/scala/mesosphere/marathon/state/Volume.scala#L107 It seems it'd be trivial to add? – Tim Harper Mar 29 '16 at 19:47
0

Sorry I can't add a comment.

I found the documentation a bit daunting. It is detailed and lots of it, but I'm trying to learn mesos, marathon etc in my own time and not having examples is really difficult for me. What I would prefer is one page where a small cluster is shown, with IP addresses, disks, CPU's and the configuration files required to setup the masters, agents and a zookeeper ensemble. Some example json files showing how to use marathon for particular use cases.

I'm aiming do so some notes for myself in my public github account showing my test cluster and explaining how everything is configured when I've got persistent volumes working, jenkins and a private docker registry all inside mesos, but I'm far away from that.

  • I've added my ansible install scripts to github https://github.com/ajazam/ansible-mesos . I'm assuming everything is there for creating a working cluster of ubuntu bare-metal hosts running mesos, zookeeper, marathon and docker – Abdul Jabbar Azam Apr 16 '16 at 21:48