I want to download files from remote server using Paramiko with multithreading.
There are two solution came into my mind, but I'm not sure which is right (or better).
Solution 1:
Assuming that the SFTPClient.get
is thread safe (But I can't find any document mentioned that), a simple one would as:
from paramiko import SSHClient, AutoAddPolicy, SFTPClient
from concurrent.futures import ThreadPoolExecutor
from typing import List
client = SSHClient()
ciient.set_missing_host_key_policy(AutoAddPolicy())
client.connect( ... )
sftp = client.open_sftp()
files_to_download: List[str] = ...
with ThreadPoolExecutor(10) as pool:
pool.map(lambda fn: sftp.get(fn, fn), files_to_download)
Solution 2: There are two questions in Solution 1
- Is the Paramiko's API really thread-safe?
- Is it efficient to download multi-files via a single connection?
So here is my second solution:
from paramiko import SSHClient, AutoAddPolicy, SFTPClient
from concurrent.futures import ThreadPoolExecutor
from threading import Lock, local
from typing import List
client = SSHClient()
ciient.set_missing_host_key_policy(AutoAddPolicy())
client.connect( ... )
thread_local = local()
thread_lock = Lock()
files_to_download: List[str] = ...
def download(fn: str) -> None:
"""
thread-safe and each thread has its own SFTPClient
"""
if not hasattr(thread_local, 'sftp'):
with thread_lock:
thread_local.sftp = client.open_sftp()
thread_local.sftp.get(fn, fn)
with ThreadPoolExecutor(10) as pool:
pool.map(download, files_to_download)
Which solution is better?