3

I'm testing the framework Apache Spark. I need monitoring some aspects about my cluster like network and resources.

Ganglia looks like a good option for what I need. Then, I found out that Spark has support to Ganglia.

On the Spark monitoring webpage there is this information: "To install the GangliaSink you’ll need to perform a custom build of Spark."

I found in my Spark the directory: "/extras/spark-ganglia-lgpl". But I don't know how to install it.

How can I install the Ganglia to monitoring Spark cluster? How I do this custom build?

Thanks!

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Danilo Pinto
  • 63
  • 1
  • 4

2 Answers2

1

Spark Ganglia support is one of Maven profiles of Spark project and it's "spark-ganglia-lgpl". In order to activate the profile, you put "-Pspark-ganglia-lgpl" option in mvn command when you build the project. For example, building Apache Hadoop 2.4.X with Ganglia is done by

mvn -Pspark-ganglia-lgpl -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package

For building the Spark project, please refer to documentation of Building Spark with Maven

suztomo
  • 5,114
  • 2
  • 20
  • 21
  • 1
    Thank you! I used the command below because I'm using Spark 1.0.2 built with SBT and it worked. SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_GANGLIA_LGPL=true sbt/sbt assembly – Danilo Pinto Dec 04 '14 at 19:21
0

So if your running the HDP stack, i would recommend updating to the latests version. It includes the spark job tracker as well as the spark client libraries to be deployed on machines. It also will now integrate with ambari metrics which is set to replace Ganglia and Nagios

ben jarman
  • 1,138
  • 10
  • 11