1

Problem:

I have an application with many lambda functions. However, most of them never log anything. That makes it hard to retrieve anything when there's a problem.

  • We use CloudWatch and CloudTrail. But the CloudWatch logs are often empty (just the start/stop is shown).

  • When we do find an event, it's difficult to get a full invocation trail, because each lambda has its own log group, so we often have to look through multiple log files. Which basically CloudTrail could help us with ...

  • However, CloudTrail isn't of much use either, because there are more than 1000 invocations each minute. While all events are unique, most of them look identical inside CloudWatch. That makes it hard to filter them. (e.g. There's no URL to filter on, as most of our events are first queued in SQS, and only later handled by a lambda. Because of that, there isn't any URL to search on in CloudTrail.)

On a positive side, for events that are coming from an SQS, we have a DLQ configured, which we can poll to see what the failed events look like. However, then still, it's hard to find the matching CloudTrail record.


Question:

To get more transparency, is there a convenient way to log the input body of all lambda invocations to CloudWatch? That would solve half of the problem.

And while doing so, is there a possibility to make recurring fields of the input searchable in CloudTrail?

Adding more metadata to a CloudTrail record would help us:

  • It would actually make it possible to filter, without hitting the 1000 results limit.
  • It would be easier to find the full CloudTrail for a given CloudWatch event or DLQ message.

Ideally, can any of this be done without changing the code of the existing lambda functions? (Simply, because there are so many of them.)

bvdb
  • 22,839
  • 10
  • 110
  • 123

2 Answers2

2

Have you considered emitting JSON logs from your Lambdas and using CloudWatch Logs Insights to search them? If you need additional custom metrics, I’d look at the Embedded Metric Format: https://aws.amazon.com/blogs/mt/enhancing-workload-observability-using-amazon-cloudwatch-embedded-metric-format/

I’d also recommend taking a look at some of the capabilities provided by Lambda Power Tools: https://awslabs.github.io/aws-lambda-powertools-python/2.5.0/

jaredcnance
  • 712
  • 2
  • 5
  • 23
2

There are a few things in here so I'll attempt to break them down one by one:

Searching across multiple log groups

As @jaredcnance recommended, CloudWatch Logs Insights will enable you to easily and quickly search across multiple log groups. You can likely get started with a simple filter @message like /my pattern/ query.

I suggest testing with 1-2 log groups and a small-ish time window so that you can get your queries correct. Once you're happy, query all of your log groups and save the queries so that you can quickly and easily run them in the future.

Logging Lambda event payloads

Yes, you can easily do this with Lambda Power Tools. If you're not using Python, check the landing page to see if your runtime is supported. If you are using a Lambda runtime that doesn't have LPT support, you can log JSON output yourself.

When you log with JSON it's trivial to query with CW Logs Insights. For example, a Python statement like this:

from aws_lambda_powertools import Logger

logger = Logger()

logger.info({
    "action": "MOVE",
    "game_id": game.id,
    "player1": game.player_x.id,
    "player2": game.player_o.id,
})

enables queries like this:

fields @timestamp, correlation_id, message.action, session_id, location
| filter ispresent(message.action) AND message.action = 'MOVE'
| sort @timestamp desc

Updating Lambda functions

Lambda runs your code and will not update itself. If you want emit logs you have to update your code. There is no way around that.

Cloudtrail

CloudTrail is designed as a security and governance tool. What you are trying to do is operational in nature (debugging). As such, logging and monitoring solutions like CW Logs are going to be your friends. While some of the data plane operations may end up in CloudTrail, CloudWatch or other logging solutions are better suited.

brianz
  • 7,268
  • 4
  • 37
  • 44