40

Is boto3 low level client for S3 thread-safe? Documentation is not explicit about it.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#client

A similar issue is discussed in Github

https://github.com/boto/botocore/issues/1246

But still there is no answer from maintainers.

Andrey Novosad
  • 401
  • 1
  • 4
  • 5

5 Answers5

30

If you take a look at the Multithreading/Processing documentation for boto3 you can see that they recommend one client per session as there is shared data between instance that can be mutated by individual threads.

It also looks like there's an open GitHub issue for this exact question. https://github.com/boto/botocore/issues/1246

Julian Mehnle
  • 156
  • 11
Skam
  • 7,298
  • 4
  • 22
  • 31
  • 4
    Unfortunatelly the [Multithreading/Processing](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/resources.html?highlight=multithreading#multithreading-multiprocessing) states about Resource instance. Nothing is said explicitly about Session or low level Client. – Andrey Novosad Oct 15 '18 at 16:39
  • 1
    The example in the doc uses session, though. It seems to me that the recommendation is quite clear that you should use a different session per thread. – Håken Lid Oct 15 '18 at 16:44
  • Thanks for the references. The original issue has been closed, and I've written a condensed answer with links, which validates what you've said: https://stackoverflow.com/a/70963911 – sam-6174 Feb 02 '22 at 22:55
21

From documentation:

Low-level clients are thread safe. When using a low-level client, it is recommended to instantiate your client then pass that client object to each of your threads.

Instantiation of the client is not thread safe while an instance is. To make things work in a multi-threaded environment, put instantiation in a global Lock like this:

boto3_client_lock = threading.Lock()

def create_client():
    with boto3_client_lock:
        return boto3.client('s3', aws_access_key_id='your key id', aws_secret_access_key='your access key')
AlexB
  • 311
  • 2
  • 4
  • 3
    Correct link https://boto3.amazonaws.com/v1/documentation/api/latest/guide/clients.html. From documentation: Unlike Resources and Sessions, clients are generally thread-safe. There are some caveats, defined below, to be aware of though. – Dima Svider Jun 18 '21 at 22:23
  • This is actually the correct solution, the other ones simply put don't work (they always fail with 'credential_provider' and/or 'endpoint_resolver'). It's really needed to lock the client before passing it down to the threaded task runners. – Marek Příhoda Jul 15 '21 at 18:26
  • I guess one has to use the same lock for every place where you implicitly use the default session, like in the above example. For example, if you have both a dynamodb and a s3 client you need to use the same lock when creating both? – tibbe Mar 30 '23 at 11:53
13

I recently tried using the single boto client instance using concurrent.futures.ThreadPoolExecutor. I run into exceptions coming from boto. I assume the boto client is not thread safe in this case.

The exception I got

  File "xxx/python3.7/site-packages/boto3/session.py", line 263, in client
    aws_session_token=aws_session_token, config=config)
  File "xxx/python3.7/site-packages/botocore/session.py", line 827, in create_client
    endpoint_resolver = self._get_internal_component('endpoint_resolver')
  File "xxx/python3.7/site-packages/botocore/session.py", line 694, in _get_internal_component
    return self._internal_components.get_component(name)
  File "xxx/python3.7/site-packages/botocore/session.py", line 906, in get_component
    del self._deferred[name]
Pawel
  • 191
  • 1
  • 8
  • 3
    same, missing the actual exception :) `KeyError: 'credential_provider'` – digitalfoo Feb 22 '20 at 02:17
  • 6
    Bizarrely, I get this credential_provider exception when I initialize a boto3 client inside the function that is threaded, but when they all use the same client initialied globally it works – LobsterMan Mar 04 '20 at 14:52
  • 2
    @LobsterMan Yeah but which way was the wind blowing at that moment? – doug65536 Dec 09 '21 at 03:45
  • The stacktrace shows the exception from `session.py` which is NOT thread safe (https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html). You need to distinguish between a `Session` and a `Client`. The latter is thread safe, see code example how to implement it properly https://boto3.amazonaws.com/v1/documentation/api/latest/guide/clients.html – Antonio Gomez Alvarado Mar 30 '23 at 15:06
4

This was answered by the boto team on May 19, 2021. See source docs here.

Resource instances are not thread safe and should not be shared across threads or processes. These special classes contain additional meta data that cannot be shared. It's recommended to create a new Resource for each thread or process:

import boto3
import boto3.session
import threading

class MyTask(threading.Thread):
    def run(self):
        # Here we create a new session per thread
        session = boto3.session.Session()

        # Next, we create a resource client using our thread's session object
        s3 = session.resource('s3')

        # Put your thread-safe code here

In the example above, each thread would have its own Boto3 session and its own instance of the S3 resource. This is a good idea because resources contain shared data when loaded and calling actions, accessing properties, or manually loading or reloading the resource can modify this data.

sam-6174
  • 3,104
  • 1
  • 33
  • 34
1

You can successfully create multiple threads, but you have to instantiate a new session per thread/process and thereby can asynchronously download from an S3 bucket for example.

An example below:

import concurrent.futures
import boto3
import json


files = ["path-to-file.json", "path-to-file2.json"] 

def download_from_s3(file_path):
    # setup a new session
    sess = boto3.session.Session()
    client = sess.client("s3")
    # download a file
    obj = client.get_object(Bucket="<your-bucket>", Key=file_path)
    resp = json.loads(obj["Body"].read())
    return resp

with concurrent.futures.ThreadPoolExecutor() as executor:
     executor.map(download_from_s3, files)

sakell
  • 121
  • 7
  • 1
    https://boto3.amazonaws.com/v1/documentation/api/latest/guide/session.html Session objects are not thread safe and should not be shared across threads and processes. You should create a new Session object for each thread or process. – Messa Dec 03 '20 at 16:23
  • You saved my life! I used this and I succeeded to increase the process 80 times! – Eliya Mar 27 '23 at 10:01