2

I'm loading data into an SQLite database via Luigi with the following code:

class LoadData(luigi.Task):   
    
    def requires(self):
        return TransformData()
        
    def run(self):
        with sqlite3.connect('database.db') as db:
            cursor = db.cursor()
            cursor.execute("INSERT INTO prod SELECT * FROM staging;")
    def output(self):
        return luigi.LocalTarget('database.db')

This works, but when I want to update or insert new data, the task doesn't execute because Luigi considers it complete (database.db already exists).

Maybe I didn't understand the good use of LocalTarget. What is the right way to approach this?

///EDIT: My question applies to the example given on this page (code for le_create_db.py). How do you solve updates and inserts in that example?

///EDIT: This question about appending to a file is similar, but the solution using marker files does not work because sqla expects an SQLAlchemyTarget output. Are there any other answers, specifically about appending to a database?

2 Answers2

0

Consider using a mock file: http://gouthamanbalaraman.com/blog/building-luigi-task-pipeline.html

In each execution you will be creating a new file.

Another solution could be using the strategy of creating a marker table inside the db, for example: https://luigi.readthedocs.io/en/stable/api/luigi.contrib.postgres.html#luigi.contrib.postgres.PostgresTarget

Jesus Sono
  • 78
  • 11
  • I read that mock files are recommended during development. What disadvantages for using it in production? – ibuildstuff Feb 24 '21 at 10:35
  • Let's image you have 3 dependant tasks in production working with mock files. The last task fails, you will need to start from the begining. – Jesus Sono Feb 24 '21 at 16:34
0

I had the same issue and was able to solve it by overriding the complete method to simply return False:

def complete(self):
    return False

Now the task is re-run every time, even if database file is present.

kchomski
  • 2,872
  • 20
  • 31