I researched about string instructions that is supported in AVX
or AVX2
ISA but I can not find any 256 bit string comparison instruction like SSE4.2
If there is any string comparison that I can not find where can I find them? Otherwise Why AVX/AVX2 ISA does not support 256 bit string instructions?
I also find out that AVX2 does not support mullo
for unsigned short
as a 16bit unsigned integer and I don't know the reason. Because it has been supported in SSE4.2
.
Asked
Active
Viewed 1,248 times
2

ADMS
- 117
- 3
- 18
-
One question per question please. – Paul R Apr 20 '16 at 09:38
1 Answers
3
256 bit string compare instructions - no, there are none in AVX/AVX2 (or AVX-512 for that matter) - just the 128 bit instructions in SSE 4.2.
mullo
for unsigned short - this is not needed, since the result would be the same as with a signed short mullo
. It's only the nulhi
instruction that needs to exist in signed and unsigned variants.

Paul R
- 208,748
- 37
- 389
- 560
-
What is the reason that they do not add string instruction set to Intel AVX? Because its not useful or because in General Purpose Processor this is not acceptable to add special purpose instructions ? – ADMS Apr 20 '16 at 10:17
-
1@ADMS: well I don't work for Intel, but I would hazard a guess that there is probably not much point - this type of operation tends to be I/O bound anyway (since it's unlikely that it would be part of a sequence of other SIMD instructions) so there would be little to gain from implementing wider versions of the existing 128 bit string instructions. – Paul R Apr 20 '16 at 10:43
-
1Wouldn't they still be useful though? Especially the 16bit RANGES mode, which suffers from not having that many ranges, but also ANY, could be nice to have a larger set.. OTOH that would no doubt be even slower than it already is – harold Apr 21 '16 at 17:38
-
-
1@harold My guess is that those instructions are really complicated. Even the pseudo-code is ugly. I don't know how much it matters, but some of them also have lane-crossing dependencies which could complicate things. Based on a CPU-design class I took some years ago, routing congestion is a big deal. And that's probably why port 5 is the only port that has any real cross-lane routing. On Sandy/Ivy, everything that crossed 128-bits went into port 5. On Haswell and later, it got stricter. Everything that crosses 64-bits goes into port 5. (excluding the load/store ports) – Mysticial Nov 15 '16 at 00:16