Questions tagged [mmx]

MMX is a single instruction, multiple data (SIMD) instruction set designed by Intel, introduced in 1997 with their P5-based Pentium line of microprocessors, designated as "Pentium with MMX Technology"

MMX is a trademark used to reference an extension to the Intel Architecture Instruction set. Officially Intel states the initials are meaningless. This extension adds 57 opcodes, a 64-bit quadword datatype and eight 64-bit registers. These registers can be addressed using the names mm0 through mm7.

To avoid compatibility problems with the context switch mechanisms in existing operating systems, these registers were aliases for the existing x87 FPU stack registers. Unlike the FP stack, the MMn registers are directly addressable.

The main usage of the MMX instruction set is based on the concept of packed data types, which means that instead of using the whole register for a single 64-bit integer, two 32-bit integers, four 16-bit integers, or eight 8-bit integers may be processed concurrently. Thus, the unofficial initials are known as "MultiMedia eXtension" or "Matrix Math eXtension."

The mapping of the MMX registers onto the existing FPU registers made it somewhat difficult to work with floating point and SIMD data in the same application.

MMX provides only integer operations

107 questions
1
vote
1 answer

clang: MMX intrinsics break long double

I have the following piece of code that crashes on assert(!isnan(x)) when compiled with clang. If I compile using -DWITH_MMX=0, it runs fine. I observe same behaviour on Compiler Explorer and locally on my macOS. I don't understand why the…
user2962393
  • 1,083
  • 9
  • 12
1
vote
1 answer

Invalid instruction operand when using punpcklwd with MMWORD PTR 64-bit memory operand

Currently working on some old assembly code, and MASM errors out with this line. punpcklwd MM3, MMWORD PTR [8+EBP+ECX*2] Gives me: error A2070: invalid instruction operands But, this should be valid, right? The disassembled code from a compiled…
1
vote
1 answer

Stuck at summing two arrays using MMX instructions using NASM

I was given the following task: Given two arrays with 16 elements: NIZA RESW 16 and NIZB RESW 16 store in the third array (NIZC RESW 16) the following values: NIZC[i]=NIZA[i]+NIZB[i] using MMX instructions and compiling it with NASM This is what…
1
vote
1 answer

Mult plus shift left ops using MMX assembler instructions

I am looking for doing shl(mult(var1,var2),1) operation, where mult multiplies var1 and var2 (both are 16-bit signed integers) and shl shifts left arithmetically the multiplication result. Result must be saturated, i.e., int32 max or int32 min if…
LooPer
  • 1,459
  • 2
  • 15
  • 24
1
vote
1 answer

why does GDB not tab-complete mmx register name(mm0-mm7)

I use gdb info registers to see all the registers, but I don't see MMX registers. My CPU is Xeon Platinum 8163, a modern Xeon cpu that supports SSE and MMX. So i think its a gdb problem(if i am right). Why does gdb not support showing mmx…
Zhaoyang
  • 49
  • 8
1
vote
2 answers

Uint8 to mm0 register

I've been playing with the example from this presentation (slide 41). It performs alpha blending as far as I'm concerned. MOVQ mm0, alpha//4 16-b zero-padding α MOVD mm1, A //move 4 pixels of image A MOVD mm2, B //move 4 pixels of image B PXOR mm3…
user13385400
1
vote
2 answers

Assembly code for optimized bitshifting of a vector

i'm trying to write a routine that will logically bitshift by n positions to the right all elements of a vector in the most efficient way possible for the following vector types: BYTE->BYTE, WORD->WORD, DWORD->DWORD and WORD->BYTE (assuming that…
Arnaud
  • 11
  • 3
1
vote
2 answers

How do i use MMX mulH and mulL for two 64 bit integers to get one 128 bit integer

Hello, I'm working on yet another arbitrary precision integer library. I wanted to implement multiplication but I got stuck when _m_pmulhw in just didn't work. there is very little documentation on MMX instructions. When I test it out,…
Jesse Taube
  • 402
  • 3
  • 11
1
vote
1 answer

MMX Register Speed vs Stack for Unsigned Integer Storage

I am contemplating an implementation of SHA3 in pure assembly. SHA3 has an internal state of 17 64 bit unsigned integers, but because of the transformations it uses, the best case could be achieved if I had 44 such integers available in the…
WDS
  • 966
  • 1
  • 9
  • 17
1
vote
2 answers

How did the legacy 3DNow! instruction set store results to memory or integer registers?

Just for fun I'm reviewing legacy (deprecated) instructions from 3DNow! set introduced by AMD, and I'm trying to understand how were they used. All instructions seem to be encoded following this pattern: instruction destination_MMn_register_operand,…
MikeF
  • 1,021
  • 9
  • 29
1
vote
2 answers

Accessing to mm1 register parts

Is it possible to access to a single byte in a mmx register, like a array? I've this code: movq mm1,vector1 movq mm2,vector2 psubw mm1,mm2 I want to put mm1[1],mm1[2],mm1[3]....into c++ vars, like: int a,b=0; mov a,mm1[1] mov b,mm1[2] Thanks.
Pepeluis
  • 931
  • 2
  • 10
  • 18
1
vote
1 answer

AMD Geode Optimization References

I am working on doing some significant optimization of some machine vision code on an embedded AMD Geode LX. I am going as far as to rewrite the computationally intense portions in Assembly, making heavy use of the x86 MMX instructions. The basic…
Jack Morrison
  • 1,623
  • 1
  • 10
  • 13
1
vote
1 answer

warning C4799: function has no EMMS instruction

I'm trying to create C# app which uses dll library which contains C++ code and inline assembly. In function test_MMX I want to add two arrays of specific length. extern "C" __declspec(dllexport) void __stdcall test_MMX(int *first_array,int…
1
vote
0 answers

How does _mm_mul_ps() add two __m128?

I´m doing a program that takes two matrix 4x4 and multiply them using Intrinsics. What I understand until now: MMX/SSE instructions set allow you to accelerate computing. In particular it uses a 4 bytes elements vector. __m128 represents a 16 bytes…
chick3n0x07CC
  • 678
  • 2
  • 10
  • 30
1
vote
1 answer

How to efficiently convert from two __m128d to one __m128i in MSVC?

Is converting then shifting then bitwise-or'ing the only way to convert from two __m128d to a single __m128i? This is perfectly acceptable to Xcode in an x64 build m128d v2dHi = .... m128d v2dLo = .... __m128i v4i =…
G Huxley
  • 1,130
  • 14
  • 19