I have a function (method to be accurate) that's primary purpose is to zip multiple vectors (similar to the zip
function in Python).
Both vectors of the windows
variable are extremely large but considering that these are vectors of references, I believed memory usage wouldn't be an issue. I was wrong, this code constantly runs into memory allocation issues and crashes due to lack of memory.
Any performance improvements are wiped away as the code immediately hits the paging memory which is limited by the storage IO.
I have tried a few variations of the code, but all of them seem to run into the same issue.
In my understanding, the only large memory allocation occurs at the time of creation of the variable windows
(which occurs in a different method). Once that initial allocation is completed, no additional large allocations should occur in the method create_zipped_kmers
as it only deals with references to the original data. Is this the correct understanding?.
Is there anything I am doing wrong here in the code, or maybe a gap in my knowledge?. How do I go about reducing the memory usage while still maintaining the performance?.
Variation 1
fn create_zipped_kmers(&'a self, windows: &'a Vec<Vec<&'a str>>) -> Vec<Vec<&'a str>> {
if windows.is_empty() {
panic!("Sequence k-mer vector cannot be empty");
}
let num_cols = windows.iter().map(|v| v.len()).min().unwrap_or(0);
let num_rows = windows.len();
let mut zipped = Vec::with_capacity(num_cols);
for col_index in 0..num_cols {
let column: Vec<&str> = (0..num_rows)
.into_par_iter()
.map(|row_index| windows[row_index][col_index])
.collect();
zipped.push(column);
}
zipped
}
Variation 2
fn create_zipped_kmers(&'a self, windows: &'a Vec<Vec<&'a str>>) -> Vec<Vec<&'a str>> {
if windows.is_empty() {
panic!("Sequence k-mer vector cannot be empty");
}
let num_cols = windows.iter().map(|v| v.len()).min().unwrap_or(0);
let num_rows = windows.len();
(0..num_cols)
.into_par_iter()
.map(|col_index| {
(0..num_rows)
.map(|row_index| windows[row_index][col_index])
.collect::<Vec<&str>>()
})
.collect()
}
Variation 3
fn create_zipped_kmers(&'a self, windows: &'a Vec<Vec<&'a str>>) -> Vec<Vec<&'a str>> {
if windows.is_empty() {
panic!("Sequence k-mer vector cannot be empty");
}
let num_seqs = windows[0].len();
let mut iters: Vec<_> = windows.par_iter().map(|n| n.into_iter()).collect();
(0..num_seqs)
.map(|_| {
iters
.par_iter_mut()
.map(|n| *n.next().unwrap())
.collect()
})
.collect()
}
Thanks.