Questions tagged [luigi]

Luigi is a Python package that helps you build complex pipelines of batch jobs.

Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

For further information, see the documentation at luigi.readthedocs.io.

Getting Luigi

Run pip install luigi to install the latest stable version from PyPI.

For bleeding edge code, git clone https://github.com/spotify/luigi and python setup.py install. Bleeding edge documentation can be found here.

If you want to run the central scheduler (highly recommended), you need to install Tornado which you can do from PyPI as well: pip install tornado.

348 questions
0
votes
1 answer

Reasons for a different interpretation of a python code?

What are the possible reasons for a different interpretation of a given python code? I have a code which I can execute with no errors on a computer, but which outputs errors on another one. The python versions are the same (2.7.12). The encoding of…
Ashargin
  • 498
  • 4
  • 11
0
votes
1 answer

JSON serialization error when creating a Luigi task graph

I'm trying to batch up the processing of a few Jupyter notebooks using Luigi, and I've run into a problem. I have two classes. The first, transform.py: import nbformat import nbconvert import luigi from nbconvert.preprocessors.execute import…
Aleksey Bilogur
  • 3,686
  • 3
  • 30
  • 57
0
votes
1 answer

Luigi child dependency in subclass

I have a flow A->B->C where C depend on B, B depend on A Let say I have another flow D->B->C I try to reuse the task. How can I easily reuse? I can create a subclass that inherit Task B and change the requires to Task D, however to allow…
xvi_16
  • 115
  • 1
  • 1
  • 10
0
votes
1 answer

object has no attribute

I am trying to work on Luigi and Openstack. While calling the class from the main, I am having issues. I am still learning Python but I dont really get the error. ERROR: AttributeError: 'OpenstackHelper' object has no attribute 'servers password =…
Heenashree Khandelwal
  • 659
  • 1
  • 13
  • 30
0
votes
1 answer

Luigi task doesn't fire up the requirements on ETL process

This is a follow up to my previous question, regarding the pattern to follow in a recurrent ETL process. Today, running the machine learning job I've written is done by hand. I download the needed input files, learn and predict things, output a .csv…
prcastro
  • 2,178
  • 3
  • 19
  • 21
0
votes
1 answer

Luigi : The task does not fail even if in the run() method ,i execute a file which does not exist

Iam new to luigi and exploring its possibilities. I encountered a problem wherein I defined the task with (requires ,run and output method). In run(), I'm executing the contents of a file. However , if the file do not exist , the task does not fail…
Nixon Raphy
  • 312
  • 2
  • 20
0
votes
1 answer

dask + luigi: raise ValueError('url type not understood: %s' % urlpath)

I am trying to merge dask with luigi, and while business logic works fine by itself, code starts throwing errors when I run a Luigi task: raise ValueError('url type not understood: %s' % urlpath) ValueError: url type not understood:…
Philipp_Kats
  • 3,872
  • 3
  • 27
  • 44
0
votes
1 answer

What is a workflow automation work where tasks have an unknown number of inputs?

I am wanting to use something like Luigi or another Workflow automation suite. My problem is that I have nodes that have an unknown number of inputs. Luigi, for example, demands that you hard code the inputs ahead of time. Let's say I have a graph…
Jarvis Jones
  • 151
  • 1
  • 4
0
votes
2 answers

Luigi - Executing 2 pipeline jobs, (Must be in SYNC, not paralell)

I am into Luigi framework development and I want to execute 2 jobs(Both are pipeline-jobs) in a single class, But in a way that Job2 must only run, when Job1 is executed completely. class ExecuteTwoJobs(luigi.Task): def requires(self): …
Talat Parwez
  • 129
  • 4
0
votes
1 answer

Replacing a table load function with a luigi task

I have a python function that loads the data into a sql server table from 2 other tables. def load_table(date1,date2): strDate1 = date1.strftime('%m/%d/%Y') strDate2 = date2.strftime('%m/%d/%Y') stmt = "insert into Agent_Queue (ID) …
optimus_prime
  • 817
  • 2
  • 12
  • 34
0
votes
1 answer

Why Spark Driver read local file

I use Spark Cluster Standalone. The master and single slave are in the same server (server B). I use Luigi (on Server A) to submit my application and deploy (client mode). My application read local files on Server B. However, the application tries…
Bastien D
  • 1,395
  • 2
  • 14
  • 26
0
votes
2 answers

calling setattr before 'self' is returned

I suspect this is kind of a klugefest on my part, but I'm working with the Luigi and Sciluigi modules which set a number of critical parameters PRIOR to 'self' being returned by an init. ANd if I try to manhandle these parameters AFTER self is…
RightmireM
  • 2,381
  • 2
  • 24
  • 42
0
votes
1 answer

Global variable reverts back to default value in Python Luigi Pipeline

var_doesHave = True class A: global var_doesHave var_doesHave = False # Call Class B class B: if (var_doesHave): # do foo else: # do bar I have python luigi…
djskj189
  • 285
  • 1
  • 5
  • 15
0
votes
0 answers

What are the functionalities of IPython Tasks database?

Having a look at the doc and Google : https://ipyparallel.readthedocs.io/en/latest/db.html There is little example of the functionalities such as : Scheduling the tasks to be launched at some time. Put an execution order of the tasks. …
tensor
  • 3,088
  • 8
  • 37
  • 71
0
votes
1 answer

Starting luigi programmatically and not waiting for the job result?

So the question is probably fairly simple. I have a job that is supposed to run for ~30 minutes and I don't want my program to wait 30 minutes for the result. I'd like to get a task name or id or something like that and return control to the user so…
4c74356b41
  • 69,186
  • 6
  • 100
  • 141