1

I have following requirement: I need to provision both Cloudera Manager and Spark Cluster via Puppet but in a way that I need minimal (or none) configuration through Cloudera Manager UI afterwards. Ideal scenario that I'm looking for is following:

Topology: 3 nodes (where node1 is spark-master and node2 and node3 are spark-workers)

  1. Provision spark cluster (this works as expected) and I have working CDH5.5 Spark cluster (verified by running Spark Pi example)
  2. Install CM server on spark-master node
  3. Install CM agent on all nodes
  4. Start CM server and agents

I'm using razorsedge/cloudera puppet module to provision Cloudera Manager (https://forge.puppetlabs.com/razorsedge/cloudera) and I have custom made Spark puppet module which support CDH5.5 Spark installation

When I open Cloudera Manager UI, I can see all three nodes but I don't see any Spark related stats on CM UI dashboard.

When investigating cm agent and server logs, these are the findings:

  1. cm agent log on spark-master (was not connected to CM server and cannot be seen on CM UI dashboard)

[12/Jan/2016 23:13:11 +0000] 4678 MainThread agent ERROR Heartbeating to EC2_PUBLIC_DNS:7182 failed

  1. cm agent log on spark-workers (connected to CM server successfully and can be seen on CM UI dashboard)

  2. cm server log on spark-master:

org.apache.avro.AvroRuntimeException: Unknown datum type: java.lang.IllegalArgumentException: Hostname invalid EC2_LOCAL_IPV4

Any idea what might be the issue here?

I'm also looking for following answers:

  1. Is it even possible at all to provision some CDH service (in my case Spark) without using Cloudera Manager UI and then have it connected to CM?
  2. If yes, which CM configuration/s need to changed to point to existing Spark Cluster?

Any help/guidance would be greatly appreciated

Bakir Jusufbegovic
  • 2,806
  • 4
  • 32
  • 48

0 Answers0