1

The question is specific to databricks. Is there any API to get the ganglia chart showing cluster usage? Need to get all the Ganglia charts that are available in the Databricks cluster metrics section for all the clusters via REST API calls. We are planning to write a script to pull all the charts of all the clusters in our databricks.

Scorpio
  • 511
  • 4
  • 14

1 Answers1

0

We recently had a similar challenge: gather metrics (of ganglia charts) centrally and also use REST API calls to fetch such data.

Because of the use of short-living job clusters that often stopped/replaced within 1-2h, we build kind of "push" logic that gather data from all such monitored clusters centrally into a dbfs location with further processing, sending data to the observability engine (ADX) and then to expose in Grafana.

So this was achieved by a the custom init script created via notebook:

%python
init_script_path = "/init-scripts/configure_ganglia_metrics.sh"
init_script = r"""#!/bin/bash
if [[ $GANGLIA_METRICS_ENABLED != "true" ]]; then
    echo "Ganglia metrics on $GANGLIA_METRICS_CLUSTER_NAME are not enabled"
    exit 0
fi
echo "Enabling collection of metrics on $GANGLIA_METRICS_CLUSTER_NAME"
cat <<'EOF' >> /tmp/gather_ganglia_metrics.sh
ROOT_PATH="/dbfs/logs/ganglia_metrics/in"
LOGS_DIR="$ROOT_PATH/$GANGLIA_METRICS_CLUSTER_NAME"
# Assign poll interval
re='^[0-9]+$'
if [[ $GANGLIA_METRICS_POLL_INTERVAL =~ $re ]] ; then
    POLL_INTEVAL=$GANGLIA_METRICS_POLL_INTERVAL
else
    POLL_INTEVAL="60"
fi
echo "Poll interval is $POLL_INTEVAL seconds"
if [[ ! -d $LOGS_DIR ]] ; then
sudo mkdir -p $LOGS_DIR
fi
while true; do
LOG_TIMESTAMP=$(date '+%Y%m%d%H%M%S')
LOG_PATH="$LOGS_DIR/$LOG_TIMESTAMP.xml"
curl <http://localhost:8652/cluster/*/{mem_total,mem_free,cpu_idle,cpu_num}> >> $LOG_PATH
sleep $POLL_INTEVAL
done
EOF
chmod a+x /tmp/gather_ganglia_metrics.sh
/tmp/gather_ganglia_metrics.sh & disown
"""

dbutils.fs.put(init_script_path, init_script, True)

With further attachment of it to the cluster and providing a few environment variables:

GANGLIA_METRICS_ENABLED=true
GANGLIA_METRICS_CLUSTER_NAME=youClusterName

The script writes xml metrics into dbfs:/logs/ganglia_metrics/in as soon as cluster starts so the next step is to process them.

More details:

Alexander Volok
  • 5,630
  • 3
  • 17
  • 33