0

I'm trying to configure an AWS Greengrass group through their JavaScript SDK, and I get everything up and running up to where I have a deployment. The issue is that the deployment seems to be stuck on "in progress" and there are no cloudwatch logs to help me.

I looked at the core device, and this is what I saw in the /greengrass/ggc/var/logs/system/runtime.log file:

[2019-01-18T03:17:22.64Z][INFO]-Greengrass Root: /greengrass
[2019-01-18T03:17:22.64Z][INFO]-Greengrass Write Directory: /greengrass/ggc
[2019-01-18T03:17:22.64Z][INFO]-Group File Directory: /greengrass/ggc/deployment/group
[2019-01-18T03:17:22.64Z][INFO]-Default Lambda UID: 498 GID: 496
[2019-01-18T03:17:22.64Z][INFO]-===========================================
[2019-01-18T03:17:22.64Z][INFO]-The current core is using the AWS IoT certificates with fingerprint: 7591dcd10e96f86dd2d323d468b84b419b26280bbcfd3c0eee45c5a12c6d2dd7
[2019-01-18T03:17:22.641Z][WARN]-worker process info: /greengrass/ggc/packages/1.7.0/var/worker/processes
[2019-01-18T03:17:22.641Z][WARN]-worker process info: /greengrass/ggc/packages/1.7.0/var/worker/processes
[2019-01-18T03:17:22.641Z][INFO]-Reloading registry
[2019-01-18T03:17:22.642Z][INFO]-The current core is using the AWS IoT certificates with fingerprint: 7591dcd10e96f86dd2d323d468b84b419b26280bbcfd3c0eee45c5a12c6d2dd7

I've checked and I'm able to successfully hit the ATS endpoint using OpenSSL and the certificates that I have. I'm using Amazon's recommended certificate from the Greengrass tutorial RSA 2048 bit key: Amazon Root CA 1.

What are some diagnostic steps or clues where to go from here?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

3 Answers3

2

I've had this issue before. I believe it's just a bug with the internals getting mangled from a bad deployment.

The way I brute force hanging deployments are to create a new core and then add known working lamdbas in a working group to the new core, kill and restart the deamon on the core device, and then redeploy.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Steve B
  • 51
  • 1
  • Thanks for the suggestion! Seems that the issue was the policy I had attached was rather bare and needed to be revised! – Sahm Samarghandi Jan 18 '19 at 18:30
  • Thanks for posting the solution Sahm. – Steve B Jan 19 '19 at 20:56
  • 1
    @SahmSamarghandi can you share your policy. I'm facing the same issue, although I have attached FullAccess and ReadAccess policy to my role. I'm still stuck in-progress for almost 6 hours, no logs nothing. It's getting frustrating. – Muhammad Bilal Hasan Jul 23 '19 at 15:10
0

So for me, I had 2 things misconfigured which prevented successful deployment

  1. The deployment was stuck "in progress" because the permissions in the policy and role that were attached needed lambda permissions to deploy. Once I did this, the deployment went from "in progress" to "failed deployment" which brought me to the second mistake.

  2. The EC2 instance which was hosting the core software somehow didn't run the setup shell script correctly (probably didn't run it as sudo) and my cgroups were not fully setup for memory (not sure what this means but you need it setup)

Thank you Steve B for the help!

0

I've encountered the mention issue many times. In my case, the problem was always related about internet connectivity.

To check your system, before starting the deployment process, please subscribe # (wildcard) to listen all topics and check out messages related about deployments. If you don't see any incoming messages, It means network connectivity is the problem.

Then you can run this command ping greengrass-ats.iot.region.amazonaws.com where the greengrass core installed to investigate the issue. If everything seems Ok, you can start deployment again. But If you see lost packages, you should investigate the main issues which causes the network problem.

serkan kucukbay
  • 623
  • 7
  • 15