0

We have a AWS Glue DataBrew job which puts the output to some S3 bucket folder. Then a java lambda is notified for this Put notification. But the following sample code throws exception:

S3EventNotification.S3EventNotificationRecord record = event.getRecords().get(0);
String s3Bucket = record.getS3().getBucket().getName();
String s3Key= record.getS3().getObject().getUrlDecodedKey();

//following throws exception --404 NoSuchKey

S3Object s3object = s3Client.getObject(s3Bucket , s3Key);

When seen in logs we see that the key is something like:

**input_files/processed_file_22Dec2022_1671678897600/fdg629ae-4f91-4869-891c-79200772fb92/databrew-test-put-object.temp

So is it that the, lambda gets the file which is still being copied into the S3 folder?. When we upload the file manually using the console, it works fine. But when databrew job uploads it, we are seeing issues.

I expect the file to be read by lambda function with the correct key.

Thanks

handle_009
  • 11
  • 2

1 Answers1

0

What is your trigger event type?

So is it that the, lambda gets the file which is still being copied into the S3 folder?

If you have a Put trigger, Lambda will get triggered when the object upload completes. S3 wouldn't create a temporary object and then delete it.

I haven't used AWS Glue DataBrew before but perhaps that is creating that temporary object? If that is the case, you need to handle it in your code.

Brian
  • 1,056
  • 9
  • 15
  • Yes the S3 Event Notification is "Put". Also , handling this in lambda function means retrying/wait/or seeing if the object is still temporary?. What would be the standard way to tackle this?. – handle_009 Dec 22 '22 at 05:57
  • If you can verify that these files are coming from DataBrew and they're just temporary files, you can do a string check in your Lambda to filter out files with keys that include ".temp". Then you can skip over them since they're already deleted and not needed. – Brian Dec 22 '22 at 06:07