1

I am working to set up DSE 5 on AWS. I have a separate OpsCenter 6.0.0 node. I have three cluster nodes, a, b and c.

Each server, OpsCenter plus 3 nodes, is an Ubuntu 14.04 server with Oracle JDK1.8.0_92 installed.

On the OpsCenter node, I installed and started OpsCenter Ok.

I created a cluster Ok, but then when I went back to OpsCenter to manage that cluster, the agents were not communicating. I tried to install agents automatically, and that failed.

I went into /usr/share/opscenter, ran bin/setup.py, and copied the /usr/share/opccenter/ssl folder and files to the nodes in /var/lib/datastax-agent/ssl/

On my cluster nodes, in the agent.log, I get the following:

ERROR [StompConnection receiver] 2016-07-18 18:04:24,747 Jul 18, 2016
6:04:24 PM org.jgroups.client.StompConnection connect
INFO: Connected to 52.0.16.77:61620

ERROR [StompConnection receiver] 2016-07-18 18:04:24,747 Jul 18, 2016
6:04:24 PM org.jgroups.client.StompConnection run
SEVERE: JGRP000112: Connection closed unexpectedly:
javax.net.ssl.SSLException: Connection has been shutdown:
   javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target
at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:95)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:71)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at org.jgroups.util.Util.readLine(Util.java:2825)
at org.jgroups.protocols.STOMP.readFrame(STOMP.java:240)
at org.jgroups.client.StompConnection.run(StompConnection.java:274)
at java.lang.Thread.run(Thread.java:745)

Caused by: javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target

Clearly, the /var/lib/datastax-agent/ssl/agentKeyStore did not get built correctly by setup.py.

I retraced all of those steps and got no errors.

Any ideas?

fcnorman
  • 1,154
  • 9
  • 19
  • Was there any sign of connection in the OpsCenter log? – mando222 Jul 19 '16 at 01:34
  • On the opscenter node, the opscenterd.log has the following at the end: – fcnorman Jul 19 '16 at 13:27
  • 2016-07-19 13:18:03,013 [clusterxx] INFO: OpsCenter starting up. (MainThread) 2016-07-19 13:18:03,024 [opscenterd] INFO: Cluster clusterxx started (MainThread) 2016-07-19 13:19:01,226 [cluster11] WARN: These nodes reported this message, Nodes: ['xx.xx.xx.xx', xx.xx.xx.xx', 'xx.xx.xx.xx'] Message: HTTP request https://xx.xx.xx.xx:61621/connection-status? failed: An error occurred while connecting: [Failure instance: Traceback (failure with no frames): : [Errno 1] certificate verify failed (javax.net.ssl.SSLHandshakeException: General SSLEngine problem) ]. – fcnorman Jul 19 '16 at 13:28
  • The full node a agent.log error is: – fcnorman Jul 19 '16 at 13:31
  • Did this work without SSL? – mando222 Jul 19 '16 at 15:12
  • Connection closed unexpectedly: SSLException: Connection has been shutdown: SSLHandshakeException: ValidatorException: PKIX path validation failed: CertPathValidatorException: signature check failed Caused by: SSLHandshakeException: ValidatorException: PKIX path CertPathValidatorException: signature check failed Caused by: ValidatorException: PKIX path validation failed: CertPathValidatorException: signature check failed Caused by: CertPathValidatorException: signature check failed Caused by: SignatureException: Signature length not correct: got 128 but was expecting 256 – fcnorman Jul 19 '16 at 17:04
  • Did this work without SSL? Yes, when I revert to no ssl, the opscenter - datastax-agent communication works. – fcnorman Jul 19 '16 at 17:10
  • This seems to me to be an issue with the SSL cert. It looks like the setup.py made a 128 key but should have made a 256. Searching the documentation this seems to be a known bug as found the fourth bullet point under Core. inhttps://docs.datastax.com/en/opscenter/6.0/opsc/release_notes/opscReleaseNotes600.html?scroll=opscReleaseNotes600__core – mando222 Jul 19 '16 at 18:03
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/117741/discussion-between-mando222-and-fcnorman). – mando222 Jul 19 '16 at 18:28
  • Ok, so I'll run without ssl for now and wait for the next release. For what it's worth, all the nodes in question have the JCE policy files correctly installed. – fcnorman Jul 19 '16 at 22:13
  • Can you share your `opscenterd.conf`, `cluster.conf` from your opscenter machine and and `address.yaml` from one of the nodes running the agent please? – markc Oct 07 '16 at 09:34
  • have the same problem with Opscenter 6.0.8 but this error is intermittent. Any solution/work around? – Arun Oct 23 '17 at 21:29

0 Answers0