I wonder know if MonetDB uses SIMD (Single Instruction Multiple Data) and if not How I can implement it for filtering or aggregation.
2 Answers
I don't know about MonetDB, but in most cases it will -- simply because modern compilers are able to detect specific constructs that might be vectorized.
However, your question is very targeted at the monetDB community. Read the MonetDB source code (it's available online on the monetDB homepage), find out where the things you want to optimize reside, and talk to the people that maintain that piece of software.
Generally, from your relatively naive way of asking, though, I'd have slight doubts that you're able to contribute a lot of optimization to such a mature project.

- 34,677
- 4
- 53
- 94
-
Thanks, that means I'd need to compile for the specific platform/processor architecture, right ? – GBrian Mar 31 '15 at 20:52
Having some hands-on experience with SIMD instructions I can tell you: it's harder than you'd think. Aggregation is feasible for a compiler to auto-vectorize but I doubt GCC does it (it would need to keep one aggregate per SIMD-lane and reduce that in the end). I could imagine ICC does that.
Filtering is VERY HARD (even by hand). The problem is that applying a predicate is not enough - you have to compact the result (i.e., throw out the values which don't qualify) and there aren't yet SIMD instructions to do that (in AVX-2). There are a lot of really smart database people thinking about how to do it by hand, i.e., using intrinsics. Unfortunately the resulting code won't make it into the MonetDB codebase because they cause portability problems.
On a more general note: Could you briefly explain what problem you're trying to solve? This seems a bit like a (vaguely defined) class assignment of pure academic value.

- 1,648
- 1
- 16
- 26
-
Thanks @Holger, I'm trying to improve a price search engine, one part is currently implemented using several tables and TSQL, so I tried columnar approach after denormalize data and it worked like a charm. Later on I started to read about AMD BOLT, OpenCL, CUDA and SIMD and wonder if something amazing like MonetDB uses (let say) AMD BOLT...well, I can't imagnie! :) As Marcus Müller said I'm far away of improving MonetDB but I'm on my way on learn a bit more. What will you recommend me to start playing around with SIMD? – GBrian Apr 01 '15 at 06:38
-
I think the main problem is maintenance. MonetDB is really run by a hand full of guys (most of them in part time). With a team like that, you're very limited in what you can do and still have a reasonably stable system. - However, things like SIMD are meant to squeeze out the last bit of performance out of your CPU. Are you sure you have exhausted all other options and you need the last 30 percent? – Holger Apr 01 '15 at 14:32
-
Yes, totally agree. Doesn't make sense putting so much effort on this, but this question help me a lot to have a better understand. Thanks – GBrian Apr 02 '15 at 08:42
-
How hard is it? Can't you just use `pmovmskb`, a lookup table and `pshufb` to pack all the "predicate true" lanes in the low end of the register, append it to the output buffer using an unaligned write, then use `pmovmskb` and `popcnt` to get the count to increment the output pointer by.. I'm sure I didn't just invent something radical so what did I miss? – harold Apr 02 '15 at 10:46
-
The problem is that the SIMD instructions have higher costs: if you add up the costs on, say Nehalem, for this solution (using http://www.agner.org/optimize/instruction_tables.pdf as a reference), you end up with 7 cycles including applying the predicate for 4 64 bit integers. Using scalar instructions (CMP, MOV, ADD) you end up with 4 cycles for the same work. Wider SIMD or smaller values may change the balance. Also AVX-512 has a compact instruction that does the same thing but I don't know the cost for that. – Holger Apr 02 '15 at 13:15
-