2

I have a node.js function for AWS Lambda. It reads a JSON file from an S3 bucket as a stream, parses it and prints the parsed objects to the console. I am using stream-json module for parsing.

It works on my local environment and prints the objects to console. But it does not print the objects to the log streams(CloudWatch) on Lambda. It simply times out after the max duration. It prints other log statements around, but not the object values.

    1. Using node.js 6.10 in both environments. 
    2. callback to the Lambda function is invoked only after the stream ends.
    3. Lambda has full access to S3
    4. Also tried Promise to wait until streams complete. But no change.

What am I missing? Thank you in advance.

const AWS = require('aws-sdk');
const {parser} = require('stream-json');
const {streamArray} = require('stream-json/streamers/StreamArray');
const {chain}  = require('stream-chain');


const S3 = new AWS.S3({ apiVersion: '2006-03-01' });

/** ******************** Lambda Handler *************************** */

exports.handler = (event, context, callback) => {
    // Get the object from the event and show its content type
    const bucket = event.Records[0].s3.bucket.name;
    const key = event.Records[0].s3.object.key;
    const params = {
            Bucket: bucket,
            Key: key
    };

    console.log("Source: " + bucket +"//" + key);

    let s3ReaderStream = S3.getObject(params).createReadStream();

    console.log("Setting up pipes");

    const pipeline = chain([
          s3ReaderStream,
          parser(),
          streamArray(),
          data => {
            console.log(data.value); 
          }
        ]);

        pipeline.on('data', (data) => console.log(data));
        pipeline.on('end', () => callback(null, "Stream ended"));
};
  • What do you see in `CloudWatch ` - ex: any error string....You maybe assign `getObject ` action on your S3 resource for lambda executetion rule. – hoangdv Nov 17 '18 at 07:03
  • 1
    No errors. I do see my log statements. Here is what the output looks like: START RequestId: 97899afb-ea79-11e8-8c67-15d57ea60ee7 Version: $LATEST 2018-11-17T15:01:07.573Z 97899afb-ea79-11e8-8c67-15d57ea60ee7 Source: com.lucidmatters.hurdles//HrdlSample.json 2018-11-17T15:01:07.613Z 97899afb-ea79-11e8-8c67-15d57ea60ee7 Setting up pipes END RequestId: 97899afb-ea79-11e8-8c67-15d57ea60ee7 REPORT RequestId: 97899afb-ea79-11e8-8c67-15d57ea60ee7 Duration: 8006.60 ms Billed Duration: 8000 ms Memory Size: 128 MB Max Memory Used: 37 MB ... Task timed out after 8.01 seconds – Vinay Dhavala Nov 17 '18 at 15:02
  • I had assigned a role with full access to S3 – Vinay Dhavala Nov 17 '18 at 15:06

1 Answers1

1

I have figured out that it is because my Lambda function is running inside a private VPC.

(I have to run it inside a private VPC because it needs to access my ElastiCache instance. I removed related code when I posted the code, for simplification).

Code can access S3 from my local machine, but not from the private VPC.

There is a process to ensure that S3 is accessible from within your VPC. It is posted here https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/

Here is another link that explains how you should setup a VPC end point to be able to access AWS resources from within a VPC https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/