I've got 7 CSV files (55 MB each) in my local source folder which I want to convert into JSON format and store into local folder. My OS is MacOS (Quad-Core Intel i5). Basically, it is a simple Rust program which is run from a console as
./target/release/convert <source-folder> <target-folder>
My multithreading approach using Rust threads is ass following
fn main() -> Result<()> {
let source_dir = PathBuf::from(get_first_arg()?);
let target_dir = PathBuf::from(get_second_arg()?);
let paths = get_file_paths(&source_dir)?;
let mut handles = vec![];
for source_path in paths {
let target_path = create_target_file_path(&source_path, &target_dir)?;
handles.push(thread::spawn(move || {
let _ = convert(&source_path, &target_path);
}));
}
for h in handles {
let _ = h.join();
}
Ok(())
}
I run it using time
to measure CPU utilisation which gives
2.93s user 0.55s system 316% cpu 1.098 total
Then I try to implement the same task using the rayon
(threadpool) crate:
fn main() -> Result<()> {
let source_dir = PathBuf::from(get_first_arg()?);
let target_dir = PathBuf::from(get_second_arg()?);
let paths = get_file_paths(&source_dir)?;
let pool = rayon::ThreadPoolBuilder::new().num_threads(15).build()?;
for source_path in paths {
let target_path = create_target_file_path(&source_path, &target_dir)?;
pool.install(|| {
let _ = convert(&source_path, &target_path);
});
}
Ok(())
}
I run it using time
to measure CPU utilisation which gives
2.97s user 0.53s system 98% cpu 3.561 total
I can't see any improvements when I use rayon. I probably use rayon the wrong way. Does anyone have an idea what is wrong with it?
Update (09 Apr)
After some time of fight with the Rust checker , just want to share a solution, maybe it could help others, or anyone else could suggest a better approach/solution
pool.scope(move |s| {
for source_path in paths {
let target_path = create_target_file_path(&source_path, &target_dir).unwrap();
s.spawn(move |_s| {
convert(&source_path, &target_path).unwrap();
});
}
});
But still does not beat the approach using rust std::thread
for 113 files.
46.72s user 8.30s system 367% cpu 14.955 total
Update (10 Apr)
After @maxy comment
// rayon solution
paths.into_par_iter().for_each(|source_path| {
let target_path = create_target_file_path(&source_path, &target_dir);
match target_path {
Ok(target_path) => {
info!(
"Processing {}",
target_path.to_str().unwrap_or("Unable to convert")
);
let res = convert(&source_path, &target_path);
if let Err(e) = res {
error!("{}", e);
}
}
Err(e) => error!("{}", e),
}
});
// std::thread solution
let mut handles = vec![];
for source_path in paths {
let target_path = create_target_file_path(&source_path, &target_dir)?;
handles.push(thread::spawn(move || {
let _ = convert(&source_path, &target_path);
}));
}
for handle in handles {
let _ = handle.join();
}
Comparison on 57 files:
std::threads: 23.71s user 4.19s system 356% cpu 7.835 total
rayon: 23.36s user 4.08s system 324% cpu 8.464 total