Parallelising file processing using rayon

Question

I've got 7 CSV files (55 MB each) in my local source folder which I want to convert into JSON format and store into local folder. My OS is MacOS (Quad-Core Intel i5). Basically, it is a simple Rust program which is run from a console as

./target/release/convert <source-folder> <target-folder>

My multithreading approach using Rust threads is ass following

fn main() -> Result<()> {
    let source_dir = PathBuf::from(get_first_arg()?);
    let target_dir = PathBuf::from(get_second_arg()?);

    let paths = get_file_paths(&source_dir)?;

    let mut handles = vec![];
    for source_path in paths {
        let target_path = create_target_file_path(&source_path, &target_dir)?;

        handles.push(thread::spawn(move || {
            let _ = convert(&source_path, &target_path);
        }));
    }

    for h in handles {
        let _ = h.join();
    }

    Ok(())
}

I run it using time to measure CPU utilisation which gives

2.93s user 0.55s system 316% cpu 1.098 total

Then I try to implement the same task using the rayon (threadpool) crate:

fn main() -> Result<()> {
    let source_dir = PathBuf::from(get_first_arg()?);
    let target_dir = PathBuf::from(get_second_arg()?);

    let paths = get_file_paths(&source_dir)?;
    let pool = rayon::ThreadPoolBuilder::new().num_threads(15).build()?;

    for source_path in paths {
        let target_path = create_target_file_path(&source_path, &target_dir)?;

        pool.install(|| {
            let _ = convert(&source_path, &target_path);
        });
    }

    Ok(())
}

I run it using time to measure CPU utilisation which gives

2.97s user 0.53s system 98% cpu 3.561 total

I can't see any improvements when I use rayon. I probably use rayon the wrong way. Does anyone have an idea what is wrong with it?

Update (09 Apr)

After some time of fight with the Rust checker , just want to share a solution, maybe it could help others, or anyone else could suggest a better approach/solution

pool.scope(move |s| {
        for source_path in paths {
            let target_path = create_target_file_path(&source_path, &target_dir).unwrap();
            s.spawn(move |_s| {
                convert(&source_path, &target_path).unwrap();
            });
        }
    });

But still does not beat the approach using rust std::thread for 113 files.

46.72s user 8.30s system 367% cpu 14.955 total

Update (10 Apr)

After @maxy comment

// rayon solution
paths.into_par_iter().for_each(|source_path| {
        let target_path = create_target_file_path(&source_path, &target_dir);

        match target_path {
            Ok(target_path) => {
                info!(
                    "Processing {}",
                    target_path.to_str().unwrap_or("Unable to convert")
                );
                let res = convert(&source_path, &target_path);
                if let Err(e) = res {
                    error!("{}", e);
                }
            }
            Err(e) => error!("{}", e),
        }
    });

    // std::thread solution
    let mut handles = vec![];
    for source_path in paths {
        let target_path = create_target_file_path(&source_path, &target_dir)?;
        handles.push(thread::spawn(move || {
            let _ = convert(&source_path, &target_path);
        }));
    }

    for handle in handles {
        let _ = handle.join();
    }

Comparison on 57 files:

std::threads: 23.71s user 4.19s system 356% cpu 7.835 total
rayon:        23.36s user 4.08s system 324% cpu 8.464 total

may be try adding more files may 100-500. And you will see the result. For CPU its trivial task to process your csv. The difference is more as you move toward complex processing / huge files / number of files you are processing. — Arjun, Apr 09 '22 at 11:34
@Arjun I tried to convert 113 files (4GB total). I think that much amount should be enough to see the difference. Rust threads: `11.64s user 2.20s system 333% cpu 4.156 total`, rayon: `47.31s user 8.31s system 96% cpu 57.465 total`. — fade2black, Apr 09 '22 at 11:58
@Arjun In Rust threads case user + system is a way greater than total, meaning that the multiple cores are used. But in the second case I don't see any signs. I guess I use the rayon the wrong way, but don't know the correct way. — fade2black, Apr 09 '22 at 12:01
You just need to use a parallel iterator on `get_file_paths()` and rayon will do the rest — MeetTitan, Apr 09 '22 at 15:42
@MeetTitan thank you for suggestion. Yeah, you are right. I considered that approach. But I'd like to implement it using manually so that I could control pools. — fade2black, Apr 09 '22 at 19:18

maxy · Accepted Answer · 2022-04-09T15:12:54.143

1

The docu for rayon install is not super clear, but the signature:

pub fn install<OP, R>(&self, op: OP) -> R where
    R: Send,
    OP: FnOnce() -> R + Send,

says it returns type R. The same type R that your closure returns. So obviously install() has to wait for the result.

This only makes sense if the closure spawns additional tasks, for example by using .par_iter() inside the closure. I suggest to use rayon's parallel iterators directly (instead of your for loop) over the list of files. You don't even need to create your own thread pool, the default pool is usually fine.

If you insist on doing it manually, you'll have to use spawn() instead of install. And you'll probably have to move your loop into a lambda passed to scope().

edited Apr 09 '22 at 15:12

answered Apr 09 '22 at 14:42

maxy

4,971
1
23
25

Got improvements, but still away from the simple approach using `std::thread`. Please see the updates in the OP. – fade2black Apr 09 '22 at 19:10
1

I'd still try and compare with `paths.into_par_iter().for_each(|source_path| {...})` just because it is so easy to do. Do it inside `pool.install(|| { ... })` to use your pool. – maxy Apr 09 '22 at 20:55
1

You can the comparison in the OP update. Almost the same. – fade2black Apr 09 '22 at 23:16
1

My guess: rayon's threads wait longer for file I/O because they request the next file only when a thread becomes idle. (Rayon is optimized for CPU-bound tasks.) Your std::threads version requests 57 files in parallel, so the OS can already schedule the I/O for the other files in the background. – maxy Apr 10 '22 at 08:54

Parallelising file processing using rayon

1 Answers1