0

I am building metrics collector to collect the running status about all the Spark Jobs running on it. The mesos API http://masterip/frameworks return a lot of details about all the frameworks and then I run http://slaveip/slave(1)/monitor/statistics to get each frameworks detail info from each slave, then correlate them.

This works fine for most of the jobs, but I have some jobs which behave different according to different parameters when submitting. They are shown as same framework name in Mesos GUI and I can not tell each other.

Is there a way to get the detail full commands which launches the job? Or any other idea about how to tell them?

You can find there are multiple instances with same framework name. As they are different spark job instances.

When I connect to Mesos slave, the monitor/statistics doesn't show the full command with all the parameters, so I can not tell which framework correlate to which Spark job instance.

  {
    "executor_id": "0",
    "executor_name": "Command Executor (Task: 0) (Command: sh -c ' 
\"/usr/local...')",
    "framework_id": "06ba8de8-7fc3-422d-9ee3-17dd9ddcb2ca-3157",
    "source": "0",
    "statistics": {
      "cpus_limit": 2.1,
      "cpus_system_time_secs": 848.689999999,
      "cpus_user_time_secs": 5128.78,
      "mem_limit_bytes": 4757389312,
      "mem_rss_bytes": 2243149824,
      "timestamp": 1522858776.20098
    }
  },

Thanks

Stephen Docy
  • 4,738
  • 7
  • 18
  • 31
Martin Peng
  • 87
  • 1
  • 9
  • Martin you may try asking on the Apache Mesos slack channel. DC/OS provides a simple way to install and manage Mesos and Spark - along with a few 100 other packages. It ships also as open source and includes a built in metrics API. You can check it out at dcos.io. – Corbin Apr 03 '18 at 15:29
  • Can you post a sample output/screenshot you have and what you want to achieve? – janisz Apr 04 '18 at 09:26
  • Thanks @janisz! I've uploaded the snapshot which shows 3 different spark job instances which all named as "hmrMonitor", from the GUI it is hard to tell them as they are launched by spark with different parameters. When I visit the http://mesos-slave:5051/slave(1)/monitor/statistics, it doesn't show the full command, so I can not tell them. – Martin Peng Apr 04 '18 at 16:21
  • @Corbin, thanks for your suggestion, however deploying Dcos is a big change to us, we will do that future release, but not current phase. – Martin Peng Apr 04 '18 at 16:25
  • So you need to match them by taskId and executorId with data from spark – janisz Apr 05 '18 at 05:26
  • Thanks @janisz! The CPU data I collected back looks like quite promising. But I have some doubt about the spark driver memory. I configured the driver memory is 4G in the spark conf, but looks like in Mesos and Spark UI it shows only around 1G. Is this info correct? (BTW: we are using Marathon as scheduler and it only configured 0.1 CPU and 64M memory as the initial value. – Martin Peng Apr 17 '18 at 02:15
  • As same logic in Diamond collector, I just take mem limit bytes as the allocated mem and mem rss bytes as the used memory. Will that be fine? Thanks! – Martin Peng Apr 17 '18 at 02:17

0 Answers0