Questions tagged [joblib]

Joblib is a set of tools to provide lightweight pipelining in Python.

Joblib is a set of tools to provide lightweight pipelining in Python.

https://joblib.readthedocs.io/en/latest/

715 questions
19
votes
2 answers

Removing cached files after a pytest run

I'm using a joblib.Memory to cache expensive computations when running tests with py.test. The code I'm using reduces to the following, from joblib import Memory memory = Memory(cachedir='/tmp/') @memory.cache def expensive_function(x): return…
rth
  • 10,680
  • 7
  • 53
  • 77
19
votes
2 answers

Multiprocessing backed parallel loops cannot be nested below threads

What is the reason of such issue in joblib? 'Multiprocessing backed parallel loops cannot be nested below threads, setting n_jobs=1' What should I do to avoid such issue? Actually I need to implement XMLRPC server which run heavy computation in…
Alex
  • 362
  • 1
  • 3
  • 13
18
votes
1 answer

Python, parallelization with joblib: Delayed with multiple arguments

I am using something similar to the following to parallelize a for loop over two matrices from joblib import Parallel, delayed import numpy def processInput(i,j): for k in range(len(i)): i[k] = 1 for t in range(len(b)): j[t]…
Francesco
  • 393
  • 1
  • 3
  • 8
18
votes
1 answer

sklearn dumping model using joblib, dumps multiple files. Which one is the correct model?

I did a sample program to train a SVM using sklearn. Here is the code from sklearn import svm from sklearn import datasets from sklearn.externals import joblib clf = svm.SVC() iris = datasets.load_iris() X, y = iris.data, iris.target clf.fit(X,…
kcc__
  • 1,638
  • 4
  • 30
  • 59
16
votes
2 answers

Does joblib.Parallel keep the original order of data passed?

I want to ask the same question as Python 3: does Pool keep the original order of data passed to map? for joblib. E.g.: Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in x) The syntax kind of implied it but I am always worried about the ordering of…
user3226167
  • 3,131
  • 2
  • 30
  • 34
16
votes
4 answers

How to save sklearn model on s3 using joblib.dump?

I have a sklearn model and I want to save the pickle file on my s3 bucket using joblib.dump I used joblib.dump(model, 'model.pkl') to save the model locally, but I do not know how to save it to s3 bucket. s3_resource =…
the_dummy
  • 317
  • 1
  • 3
  • 15
16
votes
3 answers

Can functions know if they are already multiprocessed in Python (joblib)

I have a function that uses multiprocessing (specifically joblib) to speed up a slow routine using multiple cores. It works great; no questions there. I have a test suite that uses multiprocessing (currently just the multiprocessing.Pool() system,…
15
votes
2 answers

Efficient pairwise DTW calculation using numpy or cython

I am trying to calculate the pairwise distances between multiple time-series contained in a numpy array. Please see the code below print(type(sales)) print(sales.shape) (687, 157) So, sales contains 687 time series of…
user1274878
  • 1,275
  • 4
  • 25
  • 56
15
votes
3 answers

Parallelizing four nested loops in Python

I have a fairly straightforward nested for loop that iterates over four arrays: for a in a_grid: for b in b_grid: for c in c_grid: for d in d_grid: do_some_stuff(a,b,c,d) # perform calculations and write to…
ylangylang
  • 3,294
  • 11
  • 30
  • 34
15
votes
2 answers

Writing a parallel loop

I am trying to run a parallel loop on a simple example. What am I doing wrong? from joblib import Parallel, delayed import multiprocessing def processInput(i): return i * i if __name__ == '__main__': # what are your inputs, and…
KMA
  • 183
  • 1
  • 1
  • 4
14
votes
1 answer

Saving an sklearn `FunctionTransformer` with the function it wraps

I am using sklearn's Pipeline and FunctionTransformer with a custom function from sklearn.externals import joblib from sklearn.preprocessing import FunctionTransformer from sklearn.pipeline import Pipeline This is my code: def f(x): return…
Uri Goren
  • 13,386
  • 6
  • 58
  • 110
14
votes
1 answer

How to use joblib.Memory of cache the output of a member function of a Python Class

I would like to cache the output of a member function of a class using joblib.Memory library. Here is a sample code: import joblib import numpy as np mem = joblib.Memory(cachedir='/tmp', verbose=1) @mem.cache def my_sum(x): return…
motam79
  • 3,542
  • 5
  • 34
  • 60
14
votes
3 answers

Reusing model fitted by cross_val_score in sklearn using joblib

I created the following function in python: def cross_validate(algorithms, data, labels, cv=4, n_jobs=-1): print "Cross validation using: " for alg, predictors in algorithms: print alg print # Compute the accuracy…
user
  • 2,015
  • 6
  • 22
  • 39
14
votes
2 answers

Large Pandas Dataframe parallel processing

I am accessing a very large Pandas dataframe as a global variable. This variable is accessed in parallel via joblib. Eg. df = db.query("select id, a_lot_of_data from table") def process(id): temp_df = df.loc[id] …
autodidacticon
  • 1,310
  • 2
  • 14
  • 33
12
votes
1 answer

Workaround for 32-/64-bit serialization exception on sklearn RandomForest model

If we serialize randomforest model using joblib on a 64-bit machine, and then unpack on a 32-bit machine, there is an exception: ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long' This question has been asked before:…
Vinay Kolar
  • 913
  • 1
  • 7
  • 13
1
2
3
47 48