I have following requirement: I need to provision both Cloudera Manager and Spark Cluster via Puppet but in a way that I need minimal (or none) configuration through Cloudera Manager UI afterwards. Ideal scenario that I'm looking for is following:
Topology: 3 nodes (where node1 is spark-master and node2 and node3 are spark-workers)
- Provision spark cluster (this works as expected) and I have working CDH5.5 Spark cluster (verified by running Spark Pi example)
- Install CM server on spark-master node
- Install CM agent on all nodes
- Start CM server and agents
I'm using razorsedge/cloudera puppet module to provision Cloudera Manager (https://forge.puppetlabs.com/razorsedge/cloudera) and I have custom made Spark puppet module which support CDH5.5 Spark installation
When I open Cloudera Manager UI, I can see all three nodes but I don't see any Spark related stats on CM UI dashboard.
When investigating cm agent and server logs, these are the findings:
- cm agent log on spark-master (was not connected to CM server and cannot be seen on CM UI dashboard)
[12/Jan/2016 23:13:11 +0000] 4678 MainThread agent ERROR Heartbeating to EC2_PUBLIC_DNS:7182 failed
cm agent log on spark-workers (connected to CM server successfully and can be seen on CM UI dashboard)
cm server log on spark-master:
org.apache.avro.AvroRuntimeException: Unknown datum type: java.lang.IllegalArgumentException: Hostname invalid EC2_LOCAL_IPV4
Any idea what might be the issue here?
I'm also looking for following answers:
- Is it even possible at all to provision some CDH service (in my case Spark) without using Cloudera Manager UI and then have it connected to CM?
- If yes, which CM configuration/s need to changed to point to existing Spark Cluster?
Any help/guidance would be greatly appreciated