I’m trying to read a bunch of files in a folder, process the content and save them. As I have a lot of files I need to parallelize the operation.
Here is the code I tried, but when I run it, nothing happens, I don’t even get any error. It is just stuck. Note that if I use directly process_file()
with a file name, it works.
from multiprocessing import Pool
from pathlib import Path
import torch
source_dir = Path('source/path')
target_dir = Path('target/path')
def process_file(file):
with open(file, 'r') as f:
result = ... # do stuff with f
target = target_dir / file.name
torch.save(result, target)
p = Pool(10)
p.map(process_file, source_dir.iterdir())
I was thinking that maybe it is because .iterdir()
yields a generator, but I’m having the same problem with os.listdir()
. What am I missing?
Thanks in advance.