I use ProtoBuf RepeatedField<uint32_t> store my 64 bit uint vector.Now I want to get the sum or product of two RepeatedField,and I want to use SIMD to accelerate the speed.
If I could access the RepeatedField's memory,just like a vector,to finish it? is there a efficient way to copy the data from RepeatedField to vector.Then I will use SIMD at vectors.
My code report a heap-buffer-overflow at first iteration of _mm256_loadu_si256.
Parameter papply(Parameter& a, Parameter& b) {
Parameter ret;
std::vector<uint32_t> tmp(8);
for(int i = 0;i < a.datas_size();i += 8) {
__m256i ma = _mm256_loadu_si256((__m256i*) a.mutable_datas()->Mutable(i));
__m256i mb = _mm256_loadu_si256((__m256i*) b.mutable_datas()->Mutable(i));
_mm256_add_epi32(ma,mb);
_mm256_storeu_si256((__m256i*) &tmp[0], ma);
ret.mutable_datas()->Add(tmp.begin(),tmp.end());
}
}
If I could access the RepeatedField's memory,just like a vector,to finish it? is there a efficient way to copy the data from RepeatedField to vector.Then I will use SIMD at vectors.