70

I was wondering if I could set up a lambda function for AWS, triggered whenever a new text file is uploaded into an s3 bucket. In the function, I would like to get the contents of the text file and process it somehow. I was wondering if this was possible...?

For example, if I upload foo.txt, with contents foobarbaz, I would like to somehow get foobarbaz in my lambda function so I can do stuff with it. I know I can get metadata from getObject, or a similar method.

Thanks!

jstnchng
  • 2,161
  • 4
  • 18
  • 31

4 Answers4

82

The S3 object key and bucket name are passed into your Lambda function via the event parameter. You can then get the object from S3 and read its contents.

Basic code to retrieve bucket and object key from the Lambda event is as follows:

exports.handler = function(event, context, callback) {
   const bkt = event.Records[0].s3.bucket.name;
   const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
};

Once you have the bucket and key, you can call getObject to retrieve the object:

const AWS = require('aws-sdk');
const s3 = new AWS.S3();

exports.handler = function(event, context, callback) {
    
    // Retrieve the bucket & key for the uploaded S3 object that
    // caused this Lambda function to be triggered
    const Bucket = event.Records[0].s3.bucket.name;
    const Key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));

    // Retrieve the object
    s3.getObject({ Bucket, Key }, function(err, data) {
        if (err) {
            console.log(err, err.stack);
            callback(err);
        } else {
            console.log("Raw text:\n" + data.Body.toString('ascii'));
            callback(null, null);
        }
    });
};

Here's an updated JavaScript example using ES6-style code and promises, minus error-handling:

const AWS = require('aws-sdk');
const s3 = new AWS.S3();

exports.handler = async (event, context) => {
  const Bucket = event.Records[0].s3.bucket.name;
  const Key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
  const data = await s3.getObject({ Bucket, Key }).promise();
  console.log("Raw text:\n" + data.Body.toString('ascii'));
};

A number of posters have asked for the equivalent in Java, so here's an example:

package example;

import java.net.URLDecoder;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.S3Event;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.event.S3EventNotification.S3EventNotificationRecord;

public class S3GetTextBody implements RequestHandler<S3Event, String> {
 
    public String handleRequest(S3Event s3event, Context context) {
        try {
            S3EventNotificationRecord record = s3event.getRecords().get(0);

            // Retrieve the bucket & key for the uploaded S3 object that
            // caused this Lambda function to be triggered
            String bkt = record.getS3().getBucket().getName();
            String key = record.getS3().getObject().getKey().replace('+', ' ');
            key = URLDecoder.decode(key, "UTF-8");

            // Read the source file as text
            AmazonS3 s3Client = new AmazonS3Client();
            String body = s3Client.getObjectAsString(bkt, key);
            System.out.println("Body: " + body);
            return "ok";
        } catch (Exception e) {
            System.err.println("Exception: " + e);
            return "error";
        }
    }
}
jarmod
  • 71,565
  • 16
  • 115
  • 122
  • Right, but unless I'm mistaken, isn't `data` in `console.log('CONTENT TYPE:', data.ContentType);` metadata rather than contents of a file? – jstnchng Jun 05 '15 at 13:30
  • It gives you event data but not the data of the file itself, iirc. – jstnchng Jun 05 '15 at 13:32
  • @jstnchng Yes, that's metadata. But I think you were asking for 'foobarbaz' which is the content of the actual S3 object, so you'd have to call GetObject to retrieve the object. – jarmod Jun 05 '15 at 13:47
  • getObject is a function for Java though, I think AWS API only has methods to getObject for Java, .NET, and C#. I'm not sure if there's a way to do it in js tho? – jstnchng Jun 05 '15 at 14:24
  • @jstnchng All of the SDKs support S3 getObject in some form or other. See the JavaScript getObject streaming example at http://docs.aws.amazon.com/AWSJavaScriptSDK/guide/node-examples.html. – jarmod Jun 05 '15 at 14:52
  • 1
    I want to try the same thing as specified in question but using java instead, can anybody please specify the link to some java example ? – Bruce_Wayne Feb 08 '16 at 11:45
  • @Bruce_Wayne Did you figure out a way for Java? – John Constantine Mar 25 '17 at 13:39
  • so, does the s3->key object contains the whole path or just the file name ? – cedzz Apr 16 '18 at 15:08
  • 1
    @cedzz it's the full S3 key, for example archive/cats/fluffykins.jpg. – jarmod Apr 16 '18 at 16:45
  • Should the lambda code have a specific role to access s3? Or in the example, it is supposed that s3 is open to the world? – user2105282 Apr 10 '19 at 14:58
  • @user2105282 Yes, the Lambda function should be configured with a minimally permissioned IAM role. For example, just s3:GetObject to ["arn:aws:s3:::mybucket/myprefix/*"] In the code examples I provided it's assumed that credentials are retrieved by the AWS SDK based on the Lambda function's configured IAM role. It's rare that you'd want an S3 bucket open to the world (unless it's a public, static website). – jarmod Apr 10 '19 at 15:44
19

I am using lambda function with a python 3.6 environment. The code below will read the contents of a file main.txt inside bucket my_s3_bucket. Make sure to replace name of bucket and file name according to your needs.

def lambda_handler(event, context):
    # TODO implement
    import boto3

    s3 = boto3.client('s3')
    data = s3.get_object(Bucket='my_s3_bucket', Key='main.txt')
    contents = data['Body'].read()
    print(contents)
Shubham Bansal
  • 438
  • 3
  • 8
16

You can use data.Body.toString('ascii') to get the contents of the text file, assuming that the text file was encoded used ascii format. You can also pass other encoding types to the function. Check out Node-Buffer for further details.

jaywalker
  • 1,116
  • 4
  • 26
  • 44
  • Works like a charm, btw could you take a look at my similar question? http://stackoverflow.com/questions/34056133/append-string-to-a-text-file-nodejs-in-aws-lambda – Casper Dec 03 '15 at 01:31
  • hi i want to write the same data to DynamoDB so that i want to return data object right from the callback function (passed in the s3.getObject method) how can i pull data out of the function here ? – Patel Apr 08 '19 at 09:31
1

The new AWS SDK v3 means that the files are read back as a readable stream. You'll need to take that into consideration from now on as well.

https://carova.io/snippets/read-data-from-aws-s3-with-nodejs