What is the correct way to manage paths when using RQ workers, queues and jobs

Question

My first question/post ... please be kind....

I am working on a personal project where one module runs in a loop gathering data. When data comes in, it hands off the insertion of the data into a database to a function on a queue where a listening rq worker picks it up and processes the function. The database is managed using SQLAlchemy which means it must generate an engine, session and define the database table.

The structure of the code files is:

--/home/..../collect-view/  (this is the project folder)
    -- DataCollection
        -- main_client.py  (main loop waiting for user data)
        -- collect_data.py (contains the database insertion function)
        -- base.py         (the base file for SQLAlchemy database definition)
        -- tables.py       (the file which sets up the table name and definition)
    -- app.db                  (the database file)

Note: the database file is in the higher level directory because it is also accessed by another application (Flask app) which also sits at this level

To implement this code, "collect_data" must import "base" and "tables" and "tables" must import "base". This proved to be a problem since, as soon as the collect_data function (called "transfer") is run by the worker, it could no longer find the files to import and the worker spits out an exception saying it couldn't import "base". I searched the web for answers and eventually found an answer on Github from nvie which mentioned directing the worker to the correct path using --path option. I got it working by implementing:

$ rq worker rq_worker_data2db --path /home/../../collect_view/DataCollection

I then had another path related failure where the worker said that it could not find the database table that I am trying to insert data into. So I changed the engine creation step to include my full path too...

base_url = '/home/.../collect_view/'
engine = create_engine ('sqlite:///' + base_url + 'app.db')

This problem was even more baffling to me since my worker was already working in my DataCollection directory, so I thought that ('sqlite:///../app.db') would be the correct way to locate the database (as it was during testing without the rq worker).

So, after a long explanation, my question is: What is the correct way to manage path in this case? It seems wrong to me that I have to use my complete path from /home... Am I missing something about how paths and/or rq worker (and similar) work?

Extracts from my code files follow:

main_client.py

from redis import Redis
import rq
from collect_data import transfer

redis_url = Redis.from_url('redis://')  #(config['REDIS_URL'])
queue = rq.Queue('rq_worker-data2db', connection=redis_url)

#.....
#.....

def have_data(data):

    rq_job = queue.enqueue('collect_data.transfer', data)

#.....
#.....

collect_data.py

from base import Session, engine, Base
from tables import FieldData
import time
from datetime import datetime

def transfer(info):
    timestamp_in = datetime.utcnow()
    session = Session()
    data1 = FieldData(data=info, timestamp=timestamp_in)
    session.add(data1)
    session.commit()

base.py

from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
base_url = '/home/.../collect_view/'
engine = create_engine ('sqlite:///' + base_url + 'app.db')
Session = sessionmaker(bind=engine)
Base = declarative_base()

tables.py

from sqlalchemy import  Column, String, Float, Integer, Date, DateTime, Table, ForeignKey
from base import Base
from datetime import datetime
# .....
#.....
class FieldData(Base):
    __tablename__ = 'field_data'
    id = Column(Integer, primary_key=True)
    data = Column(String(20))
    timestamp = Column(DateTime, index=True, default=datetime.utcnow)

    def __init__(self, data, timestamp):
        self.data = data
        self.timestamp = timestamp

In a terminal I first get Redis running and then run the worker with:

$ rq worker rq_worker_data2db --path /home/../../collect_view/DataCollection (where rq_worker_data2db is the worker name)

@snakecharmerb. You are correct regarding the working directory of the rq worker. I changed from full path to './DataCollection' (I launch the rq worker in "collect_view" folder) and the code works just the same as when using full path. Good so far. However, the location of the sqlite database used in the sqlalchemy engine definition is still a problem. Using "os.getcwd()" in base.py reveals that base.py is being run with cwd = 'home/.../collect_view' (DataCollection's parent directory) ??? — StapleIT, Mar 10 '19 at 07:11

What is the correct way to manage paths when using RQ workers, queues and jobs

0 Answers0