I have a simple rust function that parses varint encoding.
struct Reader {
pub pos: usize,
pub corrupt: bool,
}
impl Reader {
fn read_var(&mut self, b: &[u8]) -> u64 {
let mut i = 0u64;
let mut j = 0;
loop {
if j > 9 {
self.corrupt = true;
return 0;
}
let v = self.read_u8(b);
i |= (u64::from(v & 0x7F)) << (j * 7);
if (v >> 7) == 0 {
return i;
} else {
j += 1;
}
}
}
fn read_u8(&mut self, b: &[u8]) -> u8 {
if self.pos < b.len() {
let v = b[self.pos];
self.pos += 1;
v
} else {
self.corrupt = true;
0
}
}
}
I have 2 versions of generated code by different compilers: non-SIMD SIMD
The non-SIMD version is relatively easy to understand. It inlines read_u8
and unwinds loop.
I am not familiar with SIMD instructions, but the SIMD version seems to have a similar structure.
One weird thing is, when I run the SIMD version in multiple threads concurrently in a multi-core machine (different Reader objects each thread), processing throughput dropped significantly, but CPU util is higher than single thread version. The non-SIMD version throughput scales linearly with concurrency level.
Does anyone know how could this happen?
Some related questions:
- The code does not look like it could benefit from SIMD. Why SIMD is generated?
- Is it possible to disable SIMD generation for a single function?