Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

348 questions
4
votes
0 answers

Using Luigi, how to read PostgreSQL data and then pass such data to the next task in the workflow?

Using Luigi, I want to define a workflow with two "stages": The first one reads data from PostgreSQL. The second one does something with the data. Thus I've started by subclassing luigi.contrib.postgres.PostgresQuery and overriding host, database,…
frb
  • 3,738
  • 2
  • 21
  • 51
4
votes
2 answers

Creating luigi parameters from other parameters at initialization

I have the following question - can I use the value of one parameter to define another parameter ? Here's an illustration of what I'm trying to do. Suppose I have a config file that looks like this: [MyTaskRunner] logdir=/tmp/logs numruns=2 and I…
femibyte
  • 3,317
  • 7
  • 34
  • 59
4
votes
2 answers

Python script produces zombie processes only in Docker

I have quite complicated setup with Luigi https://github.com/spotify/luigi https://github.com/kennethreitz/requests-html and https://github.com/miyakogi/pyppeteer But long story short - everything works fine at my local Ubuntu (17.10) desktop,…
scythargon
  • 3,363
  • 3
  • 32
  • 62
4
votes
2 answers

Job Scheduler - YAML for writing job definition?

In our legacy job scheduling software (built on top of crontab), we are using apache config format (parser) for writing a job definition and we are using perl config general to parse the config files. This software is highly customized and have …
ThinkGeek
  • 4,749
  • 13
  • 44
  • 91
4
votes
2 answers

MongoDB in Luigi

I was trying to build a pipeline with luigi. First by getting data from an API, transform and then save it to a mongo db. I'm still new to luigi, my question is how do I implement the output() function which specifies outputs to a mongo db. And how…
Sam
  • 475
  • 1
  • 7
  • 19
4
votes
1 answer

Luigi set config from within the code

I've wrapped a set of luigi task into a package. For now, each etl-task has it's own luigi.cfg in the same directory, however as all of those .cfg files are the same, it looks suboptimal. On top of that, I'd prefer to write S3 credentials from a…
Philipp_Kats
  • 3,872
  • 3
  • 27
  • 44
4
votes
1 answer

Limit the number of Luigi workers when several scripts are run concurently

From what I saw and understood, when running several Luigi workflows at the same time, the number of workers is summed. This means that if I run two workflows together and that the number of workers is set to n, in the luigi.cfg file and provided…
4
votes
1 answer

Luigi task returns unfulfilled dependency at run time when dependency is complete

I am relatively new to creating flows with Luigi and am trying to understand why my small workflow is resulting in an unfulfilled dependency. I am trying to run the task StageProviders(), which has a single dependency ErrorsLogFile(). The tasks that…
Funsaized
  • 1,972
  • 4
  • 21
  • 41
4
votes
1 answer

Using luigi to update Postgres table

I've just started using the luigi library. I am regularly scraping a website and inserting any new records into a Postgres database. As I'm trying to rewrite parts of my scripts to use luigi, it's not clear to me how the "marker table" is supposed…
durrrutti
  • 1,020
  • 1
  • 8
  • 18
4
votes
1 answer

MySQL Targets in Luigi workflow

My TaskB requires TaskA, and on completion TaskA writes to a MySQL table, and then TaskB is to take in this output to the table as its input. I cannot seem to figure out how to do this in Luigi. Can someone point me to an example or give me a quick…
Rijo Simon
  • 777
  • 3
  • 15
  • 35
4
votes
1 answer

MongoDB in Luigi Python

I would like to know if there is a way to output to a MongoDB in Luigi. I see in the documentation they support files (local FS, HDFS), S3, PostgreSQL but not MongoDB. If not, could someone explain me why not? Maybe it is a bad idea to have it? I…
user2288043
  • 241
  • 4
  • 15
4
votes
4 answers

How do you pass multiple arguments to a Luigi subtask?

I have a Luigi task that requires a subtask. The subtask depends on parameters passed through by the parent task (i.e. the one that is doing the requireing). I know you can specify a parameter that the subtask can use by setting... def…
guzman
  • 177
  • 2
  • 11
4
votes
1 answer

Luigi write file directly to S3

I'm creating a data pipeline with Luigi and I'm trying to write the processed data to S3 bucket directly. The code I used is: import luigi from luigi.s3 import S3Target, S3Client class myTask(luigi.Task): def requires(self): return…
Z.G
  • 83
  • 5
4
votes
0 answers

python luigi died unexpectedly with exit code -11

I have a data pipeline with luigi that works perfectly fine if I put 1 worker to the task. However, if I put > 1 workers, then it dies (unexpectedly with exit code -11) in a stage with 2 dependencies. The code is rather complex, so a minimum example…
Felipe Gerard
  • 1,552
  • 13
  • 23
4
votes
1 answer

Using Parameters in python luigi

I have am triggering Luigi via luigi.run(["--local-scheduler"], main_task_cls=Test(Server = ActiveServer, Database = DB)) and in my class I have: class Test(luigi.Task): Database = luigi.Parameter() Server = luigi.Parameter() but the…
KillerSnail
  • 3,321
  • 11
  • 46
  • 64