Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

348 questions
6
votes
1 answer

How to run a luigi task with spark-submit and pyspark

I have a luigi python task which includes some pyspark libs. Now I would like to submit this task on mesos with spark-submit. What should I do to run it? Below is my code skeleton: from pyspark.sql import functions as F from pyspark import…
zuhakasa
  • 173
  • 1
  • 2
  • 13
6
votes
2 answers

Persist Completed Pipeline in Luigi Visualiser

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of…
jpavs
  • 648
  • 5
  • 17
5
votes
2 answers

What is the purpose of significant parameter in Luigi?

The documentation says: If a parameter is created with significant=False, it is ignored as far as the Task signature is concerned. Tasks created with only insignificant parameters differing have the same signature but are not the same instance.…
Shailesh
  • 2,116
  • 4
  • 28
  • 48
5
votes
3 answers

Luigi Pipelining : No module named pwd in Windows

I am trying to execute the tutorial given in https://marcobonzanini.com/2015/10/24/building-data-pipelines-with-python-and-luigi/. I am able to run the program on its own using local scheduler, giving me: Scheduled 2 tasks of which: * 2 ran…
ALEX MATHEW
  • 251
  • 1
  • 5
  • 13
5
votes
1 answer

Luigi - Overriding Task requires/input

I am using luigi to execute a chain of tasks, like so: class Task1(luigi.Task): stuff = luigi.Parameter() def output(self): return luigi.LocalTarget('test.json') def run(self): with self.output().open('w') as f: …
MrName
  • 2,363
  • 17
  • 31
5
votes
1 answer

How do I create luigi dependency graph but do not run anything?

Use case: some tasks are long batch jobs that take hours, need to review what completed and what failed for a given date before deciding which date to rerun first. How to view the dependency graph generated by the central scheduler while not running…
user443854
  • 7,096
  • 13
  • 48
  • 63
5
votes
2 answers

scheduling a sidekiq job with a python script

I have Sidekiq running with a Rails app. I need to be able to run a job from a Python script (as I'm using Luigi to run tasks in general). I'm searching for a Python library to work with the Sidekiq API but so far no luck. Any ideas or thoughts on…
gaba
  • 73
  • 1
  • 6
5
votes
2 answers

How to continously update target file using Luigi?

I have recently started playing around with Luigi, and I would like to find out how to use it to continuously append new data into an existing target file. Imagine I am pinging an api every minute to retrieve new data. Because a Task only runs if…
mtoto
  • 23,919
  • 4
  • 58
  • 71
5
votes
1 answer

Python Luigi - Continue with External task when satisfied

I am working on a Luigi pipeline that checks if a manually created file exists and if so, continues with the next tasks: import luigi, os class ExternalFileChecker(luigi.ExternalTask): task_namespace='MyTask' path = luigi.Parameter() …
Johan
  • 406
  • 6
  • 20
5
votes
1 answer

How can I get my Luigi scheduler to utilize multiple cores with the parallel-scheduling flag?

I have the following line in my luigi.cfg file (on all nodes, scheduler and workers): [core] parallel-scheduling: true However, when I monitor CPU utilization on my luigi scheduler (with a graph of around ~4000 tasks, handling requests from ~100…
captaincapsaicin
  • 950
  • 1
  • 7
  • 15
5
votes
1 answer

How to avoid running a specific task simultaneously in Luigi with multiple workers

I use Luigi to build data analysis tasks including plotting by matplotlib. It seems concurrent runs of matplotlib plotting causes a problem, which causes returning from the task prematurely, doing nothing, for some reason. (Looks like this is the…
Hiro
  • 475
  • 4
  • 11
5
votes
1 answer

What's a resource in Luigi Python?

In the web interface and in https://github.com/spotify/luigi/blob/master/luigi/task.py I can see that a Task can have "resources". There is also a placeholder function in a Task class called process_resources(), that just returns the empty…
Peter Smit
  • 1,594
  • 1
  • 13
  • 27
4
votes
1 answer

How to use instance attributes in Luigi Task?

Let’s say I have a task that downloads some meteo data for a given date (just for the sake of this example) and saves it in a CSV file. Let’s say for the first iteration I can only download that data from an API class…
MassyB
  • 1,124
  • 4
  • 15
  • 28
4
votes
0 answers

Using the Jaeger Python client together with Luigi

I'm just starting to use Jaeger for tracing and want to get the Python client to work with Luigi. The root of the problem is, that Luigi uses multiprocessing to fork worker processes. The docs mention that this can cause problems and recommend - in…
Achim
  • 15,415
  • 15
  • 80
  • 144
4
votes
1 answer

Luigi Programmatic Configuration

I was using a configuration file similar to the following for my luigi workflows: # Luigi logging configuration [logging] version = 1 disable_existing_loggers = false [logging.formatters.simple] format = "{levelname:8} {asctime} {module}:{lineno}…
treyhakanson
  • 4,611
  • 2
  • 16
  • 33
1 2
3
23 24