0

I have a django project running inside docker, and the service is up with the command of python manage.py runserver, with file autoreload open, and using threadings.

My code invokes shutil.make_archive() which will then invoke os.getcwd(), and from time to time, os.getcwd() will raise FileNotFoundError, by searching more information, now I realized that the error might be caused by the path is no more there because the path is deleted by somewhere else after I enter the path.

The error only raised sometimes and I couldn't find a stable way to reproduce it, once it happens, I can fix the problem by making any code changes to trigger file autoreload or restart the docker service, looks like make sure the process restarts will do the help, otherwise the error keeps raising.

When things going properly, os.getcwd() will return my django project root dir path. And I'm 100% sure my code does nothing related with dir manipulation like os.chdir().

So in a nutshell, the django server works fine, suddenly os.getcwd() starts to raise error until I restart the process.

My question is, what could be the root cause? is it possible that python manage.py runserver might cause any issue?

Python: v3.8.6
Django: v3.2
Docker: v20.10.12
always_beta
  • 229
  • 4
  • 10
  • Of the things you list at the start of the question, is the live-reloading setup related to the issue? Does disabling live reloading in the isolated Docker container help? Do you have the same problem in a non-Docker virtual environment? – David Maze Nov 29 '22 at 10:46
  • @DavidMaze I'm also suspecting that live-reloading might related, but the service is used in-house as a production environment (I know manage.py runserver is not for production usage), and for internal reason, I can't just shut down the live-reloading for my testing purpose. I can't reproduce the issue on my Macbook by running manage.py runserver without docker. I'm also curious that if docker's restart policy related (we use "unless-stopped"). – always_beta Nov 29 '22 at 11:45
  • Are you bind-mounting code from the host? (...with live reloading...in production?) The underlying mechanics of that on MacOS are somewhat complicated (the files need to cross a virtual machine boundary) and that could be contributing as well. – David Maze Nov 29 '22 at 12:04
  • Forgot to mention that, the docker for production environment is running on linux, so I'm not concerning about how it works on mac though. And code is copied into docker container rather than mounted. We're gonna switch to production-ready web servers e.g: uwsgi in the future but for now I have to deal with the issue with current environment... – always_beta Nov 29 '22 at 12:37
  • If you're running the code out of the image without a bind mount (I'd suggest this is a better practice) then the code can't change while the container is running, and the reloader can never trigger. That's where I think removing it can't be harmful, though I can't specifically connect it to this problem. – David Maze Nov 29 '22 at 14:08

1 Answers1

0

Turns out that it has nothing to do with dev server or docker. It's because shutil.make_archive is not thread safe.

What shutil.make_archive does is:

  1. call os.getcwd() to save current dir path
  2. os.chdir to whatever dir it needs
  3. do the archiving process
  4. os.chdir back to the dir path saved in first step (similar with doing pushd .; popd;)

I'm calling shutil.make_archive from multiple threads simultaneously, and also rmdir those temp dirs, and since os.chdir takes effect process-wise, so things happen in the order of:

  1. os.chdir(dir_path)
  2. os.rmdir(dir_path)
  3. os.getcwd()

and in the final step FileNotFoundError raised thanks to the race condition.

For those who want to dig more about shutil.make_archive's thread-safety, there is a thread discussed about it.

always_beta
  • 229
  • 4
  • 10