I'm trying to parallelly process a huge file ~15GB - ~60GB which contains 560 Million to 2 Billion records
the record looks something like the following
<id> <amount>
123, 6000
123, 4593
111, 1
111, 100
111, -50
111, 10000
there could be thousands of users contained within a file whose activity is recorded as series of transactions.
I processed this file sequentially. Not an issue.
This can be safely parallelized by processing every client data by same thread/task.
But when I try to process it parallelly for optimize other cores available based on creating logical group which will be processed by the same tokio task. For now I'm sticking to creating spawning a single task per available core. And the transaction goes to same task by looking at client id.
This approach is way slow than sequential.
Following is snippet of the approach
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let max_threads_supported = num_cpus::get();
let mut account_state: HashMap<u16, Client> = HashMap::new();
let (result_sender, mut result_receiver) =
mpsc::channel::<HashMap<u16, Client>>(max_threads_supported);
// repository of sender for each shard
let mut sender_repository: HashMap<u16, Sender<Transaction>> = HashMap::new();
for task_counter in 0..max_threads_supported {
let result_sender_clone = result_sender.clone();
// create separate mpsc channel for each processor
let (sender, mut receiver) = mpsc::channel::<Transaction>(10_000);
sender_repository.insert(task_counter as u16, sender);
tokio::spawn(async move {
let mut exec_engine = Engine::initialize();
while let Some(tx) = receiver.recv().await {
match exec_engine.execute_transaction(tx) {
Ok(_) => (),
Err(err) => ()
}
}
result_sender_clone
.send(exec_engine.get_account_state_owned())
.await
});
}
drop(result_sender);
tokio::spawn(async move {
// just getting reading tx from file sequential std::io::BufferedReader
for result in reader.deserialize::<Transaction>() {
match result {
Ok(tx) => {
match sender_repository.get(&(&tx.get_client_id() % max_threads_supported)) {
Some(sender) => {
sender.send(tx).await;
}
None => ()
}
}
_ =>()
}
}
});
// accumulate result from all the processor
while let Some(result) = result_receiver.recv().await {
account_state.extend(result.into_iter());
}
// do what ever you like with result
Ok(())
}
But this seems pretty slow than sequential approach. What am I doing wrong? Btw I've also tried to use broadcast approach but there is chance of lagging consumer and losing messages. So moved to mpsc.
How can I optimize this for better performance??