7

I am trying to run a Flask app which consists of:

  1. Yielding API requests on the fly
  2. Uploading each request to a SQLalchemy database
  3. Run jobs 1 and 2 as a background process

For that I have the following code:

import concurrent.futures
import queue
from concurrent.futures import ThreadPoolExecutor

from flask import Flask, current_app

app = Flask(__name__)
q = queue.Queue()


def build_cache():
    # 1. Yielding API requests on the fly
    track_and_features = spotify.query_tracks()  # <- a generator
    while True:
        q.put(next(track_and_features))


def upload_cache(tracks_and_features):
    # 2. Uploading each request to a `SQLalchemy` database
    with app.app_context():
        Upload_Tracks(filtered_dataset=track_and_features)

    return "UPLOADING TRACKS TO DATABASE"


@app.route("/cache")
def cache():
    # 3. Do `1` and `2` as a background process
    with concurrent.futures.ThreadPoolExecutor() as executor:

        future_to_track = {executor.submit(build_cache): "TRACKER DONE"}

        while future_to_track:
            # check for status of the futures which are currently working
            done, not_done = concurrent.futures.wait(
                future_to_track,
                timeout=0.25,
                return_when=concurrent.futures.FIRST_COMPLETED,
            )

            # if there is incoming work, start a new future
            while not q.empty():

                # fetch a track from the queue
                track = q.get()

                # Start the load operation and mark the future with its TRACK
                future_to_track[executor.submit(upload_cache, track)] = track
            # process any completed futures
            for future in done:
                track = future_to_track[future]
                try:
                    data = future.result()
                except Exception as exc:
                    print("%r generated an exception: %s" % (track, exc))

                del future_to_track[future]

    return "Cacheing playlist in the background..."

All of the above works, BUT NOT AS A BACKGROUND PROCESS. The app hangs when cache() is called, and resumes only when the process is done.

I run it with gunicorn -c gconfig.py app:app -w 4 --threads 12

what am I doing wrong?


EDIT: If simplify things in order do debug this, and write simply:

# 1st background process
def build_cache():
    # only ONE JOB
    tracks_and_features = spotify.query_tracks()  # <- not a generator
    while True:
        print(next(tracks_and_features))


# background cache
@app.route("/cache")
def cache():
    executor.submit(build_cache)
    return "Cacheing playlist in the background..."

THEN the process runs in the background.

However, if I add another job:

def build_cache():

    tracks_and_features = spotify.query_tracks()
    while True:
        # SQLalchemy db
        Upload_Tracks(filtered_dataset=next(tracks_and_features))

background does not work again.

In short:

Background only works if I run ONE job at a time (which was the limitation behind the idea of using queues in the first place).

seems like the problem is binding the background process to SQLalchemy, don't know. totally lost here.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198
  • How do you deploy your app? Are you using WSGI servers? – Fine Sep 13 '18 at 08:44
  • yes. I run the app with `gunicorn -c gconfig.py app:app -w 4 --threads 12` – 8-Bit Borges Sep 13 '18 at 16:54
  • By saying _app halts_ do you mean your whole gunicorn server halts or just a single request to a `/cache`? – Fine Sep 13 '18 at 17:04
  • I mean the app waits for all requests to be made at login and only then goes to homepage. It should go right away to homepage with requests being made at background – 8-Bit Borges Sep 13 '18 at 17:07
  • Well, if by app you mean some client to your Flask API, then maybe problem is not in Flask, but your app and the way it's making requests? – Fine Sep 13 '18 at 17:15
  • please refer to my edit. – 8-Bit Borges Sep 13 '18 at 18:52
  • Python has a [GIL](https://realpython.com/python-gil/) that will prevent any IO bound code to run concurrently, have you tried to use `ProcessPoolExcecutor` instead? – yorodm Sep 13 '18 at 19:05
  • no, how would it change the code? – 8-Bit Borges Sep 13 '18 at 19:07
  • @yorodm please read your link again, not IO bound, but CPU bound, don't bring false info to other people. – Fine Sep 13 '18 at 19:11
  • @Fian It was a mistake, not "bringing false information to other people". I'm not part of some "Python disinformation campaign" or something – yorodm Sep 13 '18 at 19:18
  • What do you mean by "yield API requests on the fly"? And by "upload every request to a DB"? Are you trying to log every API call to your server, or trying to cache the result of you API calls to Spotify, so you don't need to call them again? – Marco Lavagnino Sep 21 '18 at 16:27

2 Answers2

4

Still not sure what you meant by

I mean the app waits for all requests to be made at login and only then goes to homepage. It should go right away to homepage with requests being made at background

There are a few issues here:

  • Your queue is global to the process i.e. there is only one queue per gunicorn worker; you probably want the queue to be bound to your request so that multiple requests are not sharing the same queue in memory. Consider using context locals
  • If UploadTracks is writing to the database, there might be a lock on the table. Check your indices and inspect lock waits in your database.
  • SQLAlchemy might be configured with a small connection pool, and the second UploadTracks is waiting for the first to return its connection.

In your first example, the endpoint is waiting on all futures to finish before returning, whereas in your second example, the endpoint returns immediately after submitting tasks to the executor. If you want flask to respond quickly while the tasks are still running in background threads, remove the with concurrent.futures.ThreadPoolExecutor() as executor: and construct a global thread pool at the top of the module.

Using with, the context manager waits for all submitted tasks before exiting, but I am not sure if that's your main issue.

James Lim
  • 12,915
  • 4
  • 40
  • 65
  • thank you for the detailed answer. unfortunately, none of the issues solve my problem. there is no lock and `SQLalchemy` works fine with the present pool, and if I ignore `queues`, the problem persists. One background job works. two, not. – 8-Bit Borges Sep 19 '18 at 02:31
1

Try to create the ThreadPoolExecutor outside of the route handler.

import time
from concurrent.futures import ThreadPoolExecutor

from flask import Flask


def foo(*args):
    while True:
        print("foo", args)
        time.sleep(10)


app = Flask(__name__)

executor = ThreadPoolExecutor()


@app.route("/cache")
def cache():
    executor.submit(foo, "1")
    executor.submit(foo, "2")
    return "in cache"
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
dudko
  • 193
  • 1
  • 7