49

I am trying to write a program to get a zip file from s3, unzip it, then upload it to S3. But I found two exceptions that I can not catch.

1. StreamContentLengthMismatch: Stream content length mismatch. Received 980323883 of 5770104761 bytes. This occurs irregularly.

2. NoSuchKey: The specified key does not exist. This happens when I input the wrong key.

When these two exceptions occur, this program crashes.

I'd like to catch and handle these two exceptions correctly.

I want to prevent a crash.

   const unzipUpload = () => {
        return new Promise((resolve, reject) => {
            let rStream = s3.getObject({Bucket: 'bucket', Key: 'hoge/hoge.zip'})
                .createReadStream()
                    .pipe(unzip.Parse())
                    .on('entry', function (entry) {
                        if(entry.path.match(/__MACOSX/) == null){

                            // pause
                            if(currentFileCount - uploadedFileCount > 10) rStream.pause()

                            currentFileCount += 1
                            var fileName = entry.path;
                            let up = entry.pipe(uploadFromStream(s3,fileName))

                            up.on('uploaded', e => {
                                uploadedFileCount += 1
                                console.log(currentFileCount, uploadedFileCount)

                                //resume
                                if(currentFileCount - uploadedFileCount <= 10) rStream.resume()

                                if(uploadedFileCount === allFileCount) resolve()
                                entry.autodrain()
                            }).on('error', e => {
                                reject()
                            })
                        }

                    }).on('error', e => {
                        console.log("unzip error")
                        reject()
                    }).on('finish', e => {
                        allFileCount = currentFileCount
                    })
            rStream.on('error', e=> {
                console.log(e)
                reject(e)
            })
        })
    }

    function uploadFromStream(s3,fileName) {
        var pass = new stream.PassThrough();

        var params = {Bucket: "bucket", Key: "hoge/unzip/" + fileName, Body: pass};
        let request = s3.upload(params, function(err, data) {
            if(err) pass.emit('error')
            if(!err) pass.emit('uploaded')
        })
        request.on('httpUploadProgress', progress => {
            console.log(progress)
        })

        return pass
    }

This is the library I use when unzipping. https://github.com/mhr3/unzip-stream

Help me!!

Vlad Holubiev
  • 4,876
  • 7
  • 44
  • 59
tomoya ishizaka
  • 653
  • 1
  • 5
  • 10

5 Answers5

45

If you'd like to catch the NoSuchKey error thrown by createReadStream you have 2 options:

  1. Check if key exists before reading it.
  2. Catch error from stream

First:

s3.getObjectMetadata(key)
  .promise()
  .then(() => {
    // This will not throw error anymore
    s3.getObject().createReadStream();
  })
  .catch(error => {
    if (error.statusCode === 404) {
      // Catching NoSuchKey
    }
  });

The only case when you won't catch error if file was deleted in a split second, between parsing response from getObjectMetadata and running createReadStream

Second:

s3.getObject().createReadStream().on('error', error => {
    // Catching NoSuchKey & StreamContentLengthMismatch
});

This is a more generic approach and will catch all other errors, like network problems.

Vlad Holubiev
  • 4,876
  • 7
  • 44
  • 59
  • 2
    Thank You !! Your first idea is an innovative idea for me. For the second idea, something did not work. – tomoya ishizaka May 06 '17 at 00:29
  • Hey, glad it helped you. I noticed you're new to Stackoverflow, so if you feel like the answer solves your problem - mark it as 'accepted' (green checkmark). – Vlad Holubiev May 06 '17 at 01:23
  • 4
    Your second solution doesn't work, it will not catch a NoSuchKey error. I haven't found a way to catch this error though so it seems that solution 1 is the only way here. – dmo Jan 03 '18 at 19:01
  • 1
    @dmo thanks for noticing! I updated my 2nd example so it handles the error as well! – Vlad Holubiev Jan 03 '18 at 19:21
  • 2
    I don't believe that getObjectMetadata() is a valid method on the Node.js S3 SDK --- I think what you're looking for is `s3.headObject({ Bucket: , Key: }):` https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#headObject-property – zachelrath May 21 '20 at 14:06
14

You need to listen for the emitted error earlier. Your error handler is only looking for errors during the unzip part.

A simplified version of your script.

s3.getObject(params)
.createReadStream()
.on('error', (e) => {
  // handle aws s3 error from createReadStream
})
.pipe(unzip)
.on('data', (data) => {
  // retrieve data
})
.on('end', () => {
  // stream has ended
})
.on('error', (e) => {
  // handle error from unzip
});

This way, you do not need to make an additional call to AWS to find out if out if it exists.

dmo
  • 5,102
  • 3
  • 25
  • 28
  • 4
    This *should* work, but it doesn't for some reason. Errors from `node_modules/aws-sdk/lib/request.js:31` are always escaping event listener and kill the process. – yentsun Nov 12 '19 at 09:52
  • I am using a similar code in a loop.I am getting (node:12533) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added. Use emitter.setMaxListeners() to increase limit error. Is there a way to close the pipe? – Ankur Bhatia Nov 25 '20 at 18:07
  • It will close automatically once it has completed. If your loop is non-blocking and you have many items in the array you are looping over, you might be creating too many listeners. If non-blocking, refactor it and see if you get the same problem. If your loop is blocking, check if your packages can be updated as is could be a bug in a dependancy. – dmo Nov 26 '20 at 19:21
4

You can listen to events (like error, data, finish) in the stream you are receiving back. Read more on events

function getObjectStream (filePath) {
  return s3.getObject({
    Bucket: bucket,
    Key: filePath
  }).createReadStream()
}

let readStream = getObjectStream('/path/to/file.zip')
readStream.on('error', function (error) {
  // Handle your error here.
})

Tested for "No Key" error.

it('should not be able to get stream of unavailable object', function (done) {
  let filePath = 'file_not_available.zip'

  let readStream = s3.getObjectStream(filePath)
  readStream.on('error', function (error) {
    expect(error instanceof Error).to.equal(true)
    expect(error.message).to.equal('The specified key does not exist.')
    done()
  })
})

Tested for success.

it('should be able to get stream of available object', function (done) {
  let filePath = 'test.zip'
  let receivedBytes = 0

  let readStream = s3.getObjectStream(filePath)
  readStream.on('error', function (error) {
    expect(error).to.equal(undefined)
  })
  readStream.on('data', function (data) {
    receivedBytes += data.length
  })
  readStream.on('finish', function () {
    expect(receivedBytes).to.equal(3774)
    done()
  })
})
Rash
  • 7,677
  • 1
  • 53
  • 74
0

To prevent a crash, you need to asynchronously listen to the object's head metadata, where it does not return the whole object, which will take less time. Try this one!

isObjectErrorExists = async functions () => {
  try {
const s3bucket = {
secret key: '',
client id: ''
}
  const params = {
       Bucket: 'your bucket name',
       Key: 'path to object'
};
    await s3bucket.headObject(params).promise(); // adding promise will let you add await to listen to process untill it completes.

    return true;
  } catch (err) {
      return false; // headObject threw error.
    }
    throw new Error(err.message); 
  }
}

public yourFunction = async() => {
if (await this.isObjectErrorExists()) {
s3Bucket.getObject().createReadStream(); // works smoothly
}
}
  • 2
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – dan1st Feb 05 '21 at 11:39
  • @dan1st Correct, Since we are accessing the object's metadata the return duration of the promise is less & This solution helps as it can be used to check whether the object is gonna cause the crash and it can be handled easily. (The negative vote, for not writing a description is just not right. My solution works smoothly for the latest versions of aws-sdk library) the upvote is much-needed – Palash Toshniwal Feb 05 '21 at 13:56
0

Setting .on('error', () => {}) after createReadStream() will not catch the error produced from getObject (NoSuchKey, StreamContentLengthMismatch), you need to set if before the createReadStream() function.

For example

s3.getObject().on('error', error => {
    // Catching StreamContentLengthMismatch or NoSuchKey errors.
});.createReadStream()
Mina
  • 14,386
  • 3
  • 13
  • 26