Understanding continuation tokens in AWS S3

Question

I'd like to understand better how continuation tokens work in list_objects_v2(). Here is a piece of code that iterates through a large S3 bucket, storing the continuation tokens provided:

def transformer():
    # some s3 client
    response = S3C.list_objects_v2(Bucket=BUCKET_NAME)
    tokens = []
    while True:
        if "NextContinuationToken" in response:
            token = response["NextContinuationToken"]
            tokens.append(token)
            response = S3C.list_objects_v2(Bucket=BUCKET_NAME, ContinuationToken=token)
        else:
            break
    print(tokens)

What is the structure of these tokens behind the hood? I noticed if i rerun the function they are re-generated (not the same.) Also: how would I grab the token indicating the starting point for the first API call? My motivation for understanding this is in the context of parallel computations - seeing if i can't grab these tokens and then ship them out somewhere as indices for computation and get a robust result. I'm a bit of a noob so thanks for being patient :)

They're only documented as being valid for subsequent calls to list_objects_v2. If you did find an answer, it's not guaranteed it won't change tomorrow and thus break anything else you do with them. And, they have changed in the past. — Anon Coward, Apr 18 '22 at 16:32
I didnt find an answer but i guess it wasn't a big deal. just curiosity to see if i could make my code a little faster. — OctaveParango, Apr 20 '22 at 11:39

score 1 · Answer 1 · answered Apr 22 '22 at 12:49

Unfortunately it is not possible. S3 list operation is 100% sequential, i.e. you cannot parallel it.
BTW you still can do the trick, in case you need list objects in deep directory tree. Try to list one, or two (or any) levels deep in directory tree. And use each path received as base for another list request.

For ex.

/f1/f11/f111/obj.txt  
/f2/f22/f222/obj.txt  
/f2/f23/f233/obj.txt

First list rq with depth=1 will give you two keys, /f1 and /f2 And then you can list each of them to process objects in parallel.

Hope this helps!

Understanding continuation tokens in AWS S3

1 Answers1