Questions tagged [joblib]

Joblib is a set of tools to provide lightweight pipelining in Python.

Joblib is a set of tools to provide lightweight pipelining in Python.

https://joblib.readthedocs.io/en/latest/

715 questions
12
votes
2 answers

Python - Loop parallelisation with joblib

I would like some help understanding exactly what I have done/ why my code isn't running as I would expect. I have started to use joblib to try and speed up my code by running a (large) loop in parallel. I am using it like so: from joblib import…
JP1
  • 731
  • 1
  • 10
  • 27
12
votes
1 answer

What batch_size and pre_dispatch in joblib exactly mean

From documentation here https://pythonhosted.org/joblib/parallel.html#parallel-reference-documentation It's not clear for me what exactly batch_size and pre_dispatch means. Let's consider case when we are using 'multiprocessing' backend, 2 jobs (2…
Ibraim Ganiev
  • 8,934
  • 3
  • 33
  • 52
11
votes
1 answer

joblib: Worker stopped caused by timeout or memory leak

I am only using the basic joblib functionality: Parallel(n_jobs=-1)(delayed(function)(arg) for arg in arglist) I am frequently getting the warning: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a…
cmosig
  • 1,187
  • 1
  • 9
  • 24
11
votes
1 answer

Where is the memory leak? How to timeout threads during multiprocessing in python?

It is unclear how to properly timeout workers of joblib's Parallel in python. Others have had similar questions here, here, here and here. In my example I am utilizing a pool of 50 joblib workers with threading backend. Parallel Call…
11
votes
1 answer

Multiple processes sharing a single Joblib cache

I'm using Joblib to cache results of a computationally expensive function in my python script. The function's input arguments and return values are numpy arrays. The cache works fine for a single run of my python script. Now I want to spawn multiple…
Neha Karanjkar
  • 3,390
  • 2
  • 29
  • 48
10
votes
2 answers

Joblib and other parallel tasks within Airflow

I've used Joblib and Airflow in the past and haven't run into this issue. I'm trying to run a job through Airflow that runs a parallel computation using Joblib. When the Airflow job starts up I see the following warning UserWarning: Loky-backed…
Michael
  • 7,087
  • 21
  • 52
  • 81
10
votes
6 answers

How to load a model saved in joblib file from Google Cloud Storage bucket

I want to load a model which is saved as a joblib file from Google Cloud Storage bucket. When it is in local path, we can load it as follows (considering model_file is the full path in system): loaded_model = joblib.load(model_file) How can we do…
Soheil Novinfard
  • 1,358
  • 1
  • 16
  • 43
10
votes
1 answer

Gracefull python joblib kill

Is it possible to gracefully kill a joblib process (threading backend), and still return the so far computed results ? parallel = Parallel(n_jobs=4, backend="threading") result = parallel(delayed(dummy_f)(x) for x in range(100)) For the moment I…
sknat
  • 468
  • 3
  • 14
9
votes
2 answers

Joblib Parallel doesn't terminate processes

I run the code in parallel in the following fashion: grouped_data = Parallel(n_jobs=14)(delayed(function)(group) for group in grouped_data) After the computation is done I can see all the spawned processes are still active and memory consuming in a…
Ivan Sudos
  • 1,423
  • 2
  • 13
  • 25
9
votes
1 answer

Supressing warnings during parallel execution with joblib in Python

I am using a function that generates a warning that I really don't need to read. The problem is that I want to run the function in parallel and when doing so, it seems I can not suppress warnings anymore. Consider this example: import…
HansSnah
  • 2,160
  • 4
  • 18
  • 31
9
votes
1 answer

Using Joblib and getting "cannot unpack non-iterable function object"

I am new to multiprocessing. The following code properly illustrates what I am trying to do: import pandas as pd import multiprocessing from joblib import Parallel, delayed one = [True, False] one_bla = pd.Series(one) one_names = pd.Series(['Mr.…
user106742
  • 150
  • 1
  • 8
9
votes
0 answers

Python Multiprocessing: TypeError: __new__() missing 1 required positional argument: 'path'

I'm currently trying to run a parallel process in python 3.5 using the joblib library with the multiprocessing backend. However, every time it runs I get this error: Process ForkServerPoolWorker-5: Traceback (most recent call last): File…
9
votes
2 answers

Saving Random Forest

I want to save and load a fitted Random Forest Classifier, but I get an error. forest = RandomForestClassifier(n_estimators = 100, max_features = mf_val) forest = forest.fit(L1[0:100], L2[0:100]) joblib.dump(forest,…
mkolarek
  • 527
  • 1
  • 7
  • 16
9
votes
1 answer

Memoizing SQL queries

Say I have a function that runs a SQL query and returns a dataframe: import pandas.io.sql as psql import sqlalchemy query_string = "select a from table;" def run_my_query(my_query): # username, host, port and database are hard-coded here …
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
9
votes
1 answer

Correct way to cache only some methods of a class with joblib

I am writing a class that has some computation-heavy methods and some parameters that the user will want to iteratively tweak and are independent of the computation. The actual use is for visualization, but here's a cartoon example: class…
mwaskom
  • 46,693
  • 16
  • 125
  • 127
1 2
3
47 48