Intro
I am porting some code that run on a regular laptop to a cluster (HPC) with MPI.
What I am dealing with is an embarrassingly parallel problem where I am sending different file paths to a bunch of workers. Each corresponding file contains one numpy array that have been previously generated using the joblib.dump()
function with lzma compression=2
.
Details
All the files are saved in the same directory
Example of file list generate by joblib.dump()
:
- File1.lzma
- File1.lzma_01.npy.z
- File2.lzma
- File2.lzma_01.npy.z
If I pass to the workers the path to the files with .lmza extension (Ex. File1.lzma) joblib.load()
on the worker cannot load the file and gives me an error. It is the same if I pass the files with .lzma_01.npy.z. My guess is because both files are needed and in case of HPC is not enough that the files sits in the same directory (on the code running in my laptop is enough and the files are properly loaded)
Questions
1) Is my hypothesis correct?
2) Is there a way to pass both file paths to joblib.load()
?
3) Is this a missing functionality and I should reprocess the files and save them with pickle?
4) Am I completely wrong?
Thanks