4

I'm writing a server-side application in flask / python and have an issue with some data that has to be loaded for calculation. Loading the data (about 40 MB) takes much longer that processing the server response, and the data never changes, so I want it to be loaded only once, effectively when apache starts up. But no matter what I try, it keeps reloading each time a request comes in and massively slowing things down. I can tell by the print statement shown below, which writes to the apache logs for each request. I want to load the data, and so write that line to the logs, only once on startup.

Interestingly, this only happens when the script is run via apache on WSGI - if I run it locally using python from the command line, the data load only happens once, and server responses are much faster.

Any thoughts?

My most recent attempt, using flask_cache, is like this:

@cache.cached(key_prefix = 'my_key')
def load_huge_file():
 #Do some things and assign data from a large file to loaded_data
 print "Huge data set loaded!"
 return loaded_data

shared_data = load_huge_file()

@app.route("/user_input")
def user_response():
 global shared_data
 return fairly_quick_function(args, shared_data)

Edit - Thanks - using before_first_request and adding "WSGIDaemonProcess myApp processes=1" to my WSGI config did the trick. Now it keeps the process running and just spins new requests off of it, instead of re running the init each time.

  • If by 'via Apache on WSGI' you mean mod_wsgi, the issue is that most likely you are running in embedded mode of mod_wsgi, which means you have a multi process configuration, with Apache also dynamically destroying and creating processes as throughput changes. Provide more detail about your Apache MPM and mod_wsgi configuration. Using mod_wsgi daemon mode would be a better choice. – Graham Dumpleton Sep 14 '14 at 21:37

1 Answers1

4

You'll have to load this once per process; how many times that is depends on how you configured WSGI.

Do not use Flask-Cache here; it cannot guarantee that the data remains loaded (it'll promise that the data is loaded for a maximum amount of time, never a minimum).

You could load the data with a app.before_first_request() handler:

@app.before_first_request
def load_huge_file():
    #Do some things and assign data from a large file to loaded_data
    print "Huge data set loaded!"
    global shared_data
    shared_data = loaded_data

but loading it when the module is imported should be fine too unless you are running this with the Flask development server in reload mode. You were already doing this at import time, but the @cache.cached() decorator is not going to help here as it'll kill your other cached data.

If you see the data loaded on each request then your WSGI configuration is wrong; it is Apache that creates a new process for each request. Adjust your WSGI setup to use daemon mode (with WSGIDaemonProcess); that way you disconnect creating processes from Apache's process and can keep this data around for (much) longer.

Also see the Flask deployment on mod_wsgi documentation.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Is `global` really necessary here? – Swift Sep 24 '18 at 14:52
  • @Swift: `global` is really necessary to alter the value of a global name. There are better ways of setting the global like setting it at the top level of the module instead, at import time, as described in my answer. – Martijn Pieters Sep 24 '18 at 15:17
  • Would you mind suggesting a way to cache data from an external API? Perhaps you have a link handy that might answer the question :) – Swift Sep 24 '18 at 15:22
  • @Swift: I'd cache API call results in Redis or Memcached. Such data could change outside of your control, so don't treat that as import-once data at start-up. – Martijn Pieters Sep 24 '18 at 15:43