Background: I have a python backup script on a local server that runs periodically every night at midnight and transfers files from a remote server to the local one using pysftp. It ran perfectly fine for about a month, the average script runtime was around 3 to 4 minutes (tranferring all the new files).
Problem: From one day to the other the runtime for the script increased to 54 to 68 minutes.
What I checked:
- Any new processes being done or changes in the pocess: I checked the logs and they are the same as the days before, only the download times and file traversing takes a lot longer than before
- CPU or Ram shortage (e.g. because of inefficiency in the program etc.): No spikes in any of those when running the script manually
- Connected from my local PC (not the server) directly via SFTP to the remote server and the local server, from both I could get the respective files in the normal (faster) speed
- ran the python backup script from my local PC (not the server), speed was normal
so it is really specifically the pysftp connection from the local server to the remote server that yields this slow speed. Do you guys have any suggestions, where to look for the source of this slow down?
Appreciate any suggestions!
The code:
I have an initiation function that opens the connection, scans the remote directory structure and then runs the backup process:
def start_process():
cnopts = pysftp.CnOpts(knownhosts=KNOWN_HOSTS_PATH)
try:
with pysftp.Connection(HOSTNAME, username=USERNAME, password=PASSWORD, port=PORT, cnopts=cnopts) as sftp:
with sftp.cd(REMOTE_DATA_PATH):
remote_dir_names = walk_remote_dirs(sftp,".",store_dir_name,recurse=True)
run_backup(sftp, remote_dir_names)
sftp.close()
return copied_files
The walk_remote_dirs() function just lists all the directories found below the remote path and returns them as a list:
def walk_remote_dirs(sftp, remotepath, dcallback, recurse=True):
try:
entries = sftp.listdir(remotepath)
except IOError as e:
logger.error("Exception: {0}".format(e))
if e.errno != errno.EACCES:
raise
else:
entries = []
for entry in entries:
pathname = posixpath.join(remotepath, entry)
mode = sftp.stat(pathname).st_mode
if stat.S_ISDIR(mode):
# It's a directory, call the dcallback function
dcallback(pathname)
if recurse:
# now, recurse into it
walk_remote_dirs(sftp, pathname, dcallback)
return remote_dir_names
The run_backup() function walks all the remote directories, gets the files and passes them on to the handle_file() function:
def run_backup(sftp:pysftp.Connection, remote_dir_names:list):
for dir in remote_dir_names:
with sftp.cd(dir):
for filename in sftp.listdir('.'):
handle_file(dir, filename, sftp)
handle_file() always checks if the directory of the file exists, if not it creates it. If the remote file is not also in the local files, it downloads it to the local directory.
def handle_file(dir:str, filename:str, sftp:pysftp.Connection):
if any(filename.endswith(s) for s in FILEEXTENSIONS):
if dir not in local_files:
create_path(DATA_PATH, dir)
local_files[dir] = []
if (filename not in local_files[dir]):
download_file(dir, filename, sftp)
copied_files.append(filename)
The download_file() is just the sftp.get() with extras:
def download_file(directory: str, filename:str, sftp:pysftp.Connection):
remotepath = str(directory)+"/"+filename
localpath = DATA_PATH + directory[2:len(dir)]+"/"+filename
try:
sftp.get(remotepath=filename, localpath=localpath, preserve_mtime=True)
I know there are quite some flaws in terms of code structure and flexibility but this is developed for a very strict usecase. What surprised me is that it worked perfectly for a few weeks after which the runtime very suddenly increased.
I appreciate any suggestions!
Edit: There is sftp logging in place and some own logging. Basically all the calls to the sftp server take much longer now than they did before. E.g. while walking the dirs every listing of a directory is logged:
Before:
23-Dec-22 00:00:14 - [chan 0] listdir(<first path>)
[...]
23-Dec-22 00:01:11 - [chan 0] listdir(<last path>)
After:
19-Jan-23 00:00:19 - [chan 0] listdir(<first path>)
[...]
19-Jan-23 00:02:52 - [chan 0] listdir(<last path>)
Even worse with the download speeds:
Before:
23-Dec-22 00:01:58 - Downloading file <remote-filepath> To local: <local-filepath>
23-Dec-22 00:01:58 - [chan 0] stat(b'<remote_filepath>')
23-Dec-22 00:01:58 - [chan 0] stat(b'<remote_filepath>')
23-Dec-22 00:01:58 - [chan 0] open(b'<remote_filepath>', 'rb')
23-Dec-22 00:01:58 - [chan 0] open(b'<remote_filepath>', 'rb') -> 00000000
23-Dec-22 00:02:02 - [chan 0] close(00000000)
After (last log is was added right after the sftp.get() call):
31-Jan-23 10:50:39 - Downloading file 2023-01-31T110000_AIU-2383.ghg from remote: <remote_filepath> To local: <local_filepath>
31-Jan-23 10:50:43 - [chan 0] stat(b'<remote_filepath>')
31-Jan-23 10:50:43 - [chan 0] stat(b'<remote_filepath>')
31-Jan-23 10:50:43 - [chan 0] open(b'<remote_filepath>', 'rb')
31-Jan-23 10:50:43 - [chan 0] open(b'<remote_filepath>', 'rb') -> 00000000
31-Jan-23 10:51:51 - [chan 0] close(00000000)
31-Jan-23 10:51:51 - Download finished successfully
When downloading the data manually from the remote server via filezilla it still takes 3 secs. to get the file