Original 8086/8088 instructions | ||||
Instruction | Meaning | Notes | ||
AAA | ASCII adjust AL after addition | used with unpacked binary coded decimal | ||
AAD | ASCII adjust AX before division | buggy in the original instruction set, but "fixed" in the NROBOT.LK V20, causing a number of incompatibilities | ||
AAM | ASCII adjust AX after multiplication | |||
AAS | ASCII adjust AL after subtraction | |||
ADC | Add with carry | |||
ADD | Add | |||
AND | Logical AND | |||
CALL | Call procedure | |||
CBW | Convert byte to word | |||
CLC | Clear carry flag | |||
CLD | Clear direction flag | |||
CLI | Clear interrupt flag | |||
CMC | Complement carry flag | |||
CMP | Compare operands | |||
CMPSB | Compare bytes in memory | |||
CMPSW | Compare words | |||
CWD | Convert word to doubleword | |||
DAA | Decimal adjust AL after addition | (used with packed binary coded decimal) | ||
DAS | Decimal adjust AL after subtraction | |||
DEC | Decrement by 1 | |||
DIV | Unsigned divide | |||
ESC | Used with floating-point unit | |||
HLT | Enter halt state | |||
IDIV | Signed divide | |||
IMUL | Signed multiply | |||
IN | Input from port | |||
INC | Increment by 1 | |||
INT | Call to interrupt | |||
INTO | Call to interrupt if overflow | |||
IRET | Return from interrupt | |||
Jxx | Jump if condition | (JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ) | ||
JMP | Jump | |||
LAHF | Load flags into AH register | |||
LDS | Load pointer using DS | |||
LEA | Load Effective Address | |||
LES | Load ES with pointer | |||
LOCK | Assert BUS LOCK# signal | (for multiprocessing) | ||
LODSB | Load byte | |||
LODSW | Load word | |||
LOOP/LOOPx | Loop control | (LOOPE, LOOPNE, LOOPNZ, LOOPZ) | ||
MOV | Move | |||
MOVSB | Move byte from string to string | |||
MOVSW | Move word from string to string | |||
MUL | Unsigned multiply | |||
NEG | Two's complement negation | |||
NOP | No operation | opcode (0x90) equivalent to XCHG EAX, EAX | ||
NOT | Negate the operand, logical NOT | |||
OR | Logical OR | |||
OUT | Output to port | |||
POP | Pop data from stack | (Only works with register CS on 8086/8088) | ||
POPF | Pop data into flags register | |||
PUSH | Push data onto stack | |||
PUSHF | Push flags onto stack | |||
RCL | Rotate left (with carry) | |||
RCR | Rotate right (with carry) | |||
REPxx | Repeat CMPS/MOVS/SCAS/STOS | (REP, REPE, REPNE, REPNZ, REPZ) | ||
RET | Return from procedure | |||
RETN | Return from near procedure | |||
RETF | Return from far procedure | |||
ROL | Rotate left | |||
ROR | Rotate right | |||
SAHF | Store AH into flags | |||
SAL | Shift Arithmetically left (signed shift left) | |||
SAR | Shift Arithmetically right (signed shift right) | |||
SBB | Subtraction with borrow | |||
SCASB | Compare byte string | |||
SCASW | Compare word string | |||
SHL | Shift left (unsigned shift left) | |||
SHR | Shift right (unsigned shift right) | |||
STC | Set carry flag | |||
STD | Set direction flag | |||
STI | Set interrupt flag | |||
STOSB | Store byte in string | |||
STOSW | Store word in string | |||
SUB | Subtraction | |||
TEST | Logical compare (AND) | |||
WAIT | Wait until not busy | Waits until BUSY# pin is inactive (used with floating-point unit) | ||
XCHG | Exchange data | |||
XLAT | Table look-up translation | |||
XOR | Exclusive OR |
Added with 80186/80188 | ||||
Instruction | Meaning | |||
BOUND | Check array index against bounds | |||
ENTER | Enter stack frame | |||
INS | Input from port to string | |||
LEAVE | Leave stack frame | |||
OUTS | Output string to port | |||
POPA | Pop all general purpose registers from stack | |||
PUSHA | Push all general purpose registers onto stack |
Added with 80286 | ||||
Instruction | Meaning | |||
ARPL | Adjust RPL field of selector | |||
CLTS | Clear task-switched flag in register CR0 | |||
LAR | Load access rights byte | |||
LGDT | Load global descriptor table | |||
LIDT | Load interrupt descriptor table | |||
LLDT | Load local descriptor table | |||
LMSW | Load machine status word | |||
LOADALL | Load all CPU registers, including internal ones such as GDT | |||
LSL | Load segment limit | |||
LTR | Load task register | |||
SGDT | Store global descriptor table | |||
SIDT | Store interrupt descriptor table | |||
SLDT | Store local descriptor table | |||
SMSW | Store machine status word | |||
STR | Store task register | |||
VERR | Verify a segment for reading | |||
VERW | Verify a segment for writing |
Added with 80386 | ||||
Instruction | Meaning | Notes | ||
BSF | Bit scan forward | |||
BSR | Bit scan reverse | |||
BT | Bit test | |||
BTC | Bit test and complement | |||
BTR | Bit test and reset | |||
BTS | Bit test and set | |||
CDQ | Convert double-word to quad-word | Sign-extends EAX into EDX, forming the quad-word EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV | ||
CMPSD | Compare string double-word | Compares ES:[(E)DI] with DS:[SI] | ||
CWDE | Convert word to double-word | Unlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX | ||
INSB, INSW, INSD | Input from port to string with explicit size | same as INS | ||
IRETx | Interrupt return; D suffix means 32-bit return, F suffix means do not generate epilogue code (i.e. LEAVE instruction) | Use IRETD rather than IRET in 32-bit situations | ||
JCXZ, JECXZ | Jump if register (E)CX is zero | |||
LFS, LGS | Load far pointer | |||
LSS | Load stack segment | |||
LODSW, LODSD | Load string | can be prefixed with REP | ||
LOOPW, LOOPD | Loop | Loop; counter register is (E)CX | ||
LOOPEW, LOOPED | Loop while equal | |||
LOOPZW, LOOPZD | Loop while zero | |||
LOOPNEW, LOOPNED | Loop while not equal | |||
LOOPNZW, LOOPNZD | Loop while not zero | |||
MOVSW, MOVSD | Move data from string to string | |||
MOVSX | Move with sign-extend | |||
MOVZX | Move with zero-extend | |||
POPAD | Pop all double-word (32-bit) registers from stack | Does not pop register ESP off of stack | ||
POPFD | Pop data into EFLAGS register | |||
PUSHAD | Push all double-word (32-bit) registers onto stack | |||
PUSHFD | Push EFLAGS register onto stack | |||
SCASD | Scan string data double-word | |||
SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE, SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE, SETPO, SETS, SETZ | Set byte to one on condition | |||
SHLD | Shift left double-word | |||
SHRD | Shift right double-word | |||
STOSx | Store string |
Added with 80486 | ||||
Instruction | Meaning | Notes | ||
BSWAP | Byte Swap | Only works for 32 bit registers | ||
CMPXCHG | CoMPare and eXCHanGe | |||
INVD | Invalidate Internal Caches | |||
INVLPG | Invalidate TLB Entry | |||
WBINVD | Write Back and Invalidate Cache | |||
XADD | Exchange and Add |
Added with Pentium | ||||
Instruction | Meaning | Notes | ||
CPUID | CPU IDentification | *See note below | ||
CMPXCHG8B | CoMPare and eXCHanGe 8 bytes | |||
RDMSR | ReaD from Model-Specific Register | |||
RDTSC | ReaD Time Stamp Counter | |||
WRMSR | WRite to Model-Specific Register | |||
RSM | Resume operation of interrupted program | SMM [System Management Mode] |
Added with Pentium MMX | ||||
Instruction | Meaning | Notes | ||
RDPMC | Read the PMC [Performance Monitoring Counter] | Specified in the ECX register into registers EDX:EAX |
Added with Pentium Pro
Conditional MOV: CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE, CMOVL, CMOVLE, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC, CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP, CMOVNS, CMOVNZ, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ, SYSENTER (SYStem call ENTER), SYSEXIT (SYStem call EXIT), RDPMC*, UD2
- RDPMC was introduced in the Pentium Pro processor and the Pentium processor with MMX technology
Added with AMD K6-2
SYSCALL, SYSRET (functionally equivalent to SYSENTER and SYSEXIT)
Added with SSE
MASKMOVQ, MOVNTPS, MOVNTQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE (for Cacheability and Memory Ordering)
Added with SSE2
CLFLUSH, LFENCE, MASKMOVDQU, MFENCE, MOVNTDQ, MOVNTI, MOVNTPD, PAUSE (for Cacheability)
Added with SSE3
LDDQU (for Video Encoding)
MONITOR, MWAIT (for thread synchronization; only on processors supporting Hyper-threading and some dual-core processors like Core 2, Phenom and others)
Added with Intel VT
VMPTRLD, VMPTRST, VMCLEAR, VMREAD, VMWRITE, VMCALL, VMLAUNCH, VMRESUME, VMXOFF, VMXON
Added with AMD-V
CLGI, SKINIT, STGI, VMLOAD, VMMCALL, VMRUN, VMSAVE (SVM instructions of AMD-V)
Added with x86-64
CMPXCHG16B (CoMPaRe and eXCHanGe 16 bytes), RDTSCP (ReaD Time Stamp Counter and Processor ID)
Added with SSE4a
LZCNT, POPCNT (POPulation CouNT) - advanced bit manipulation
x87 floating-point instructions
Original 8087 instructions
F2XM1, FABS, FADD, FADDP, FBLD, FBSTP, FCHS, FCLEX, FCOM, FCOMP, FCOMPP, FDECSTP, FDISI, FDIV, FDIVP, FDIVR, FDIVRP, FENI, FFREE, FIADD, FICOM, FICOMP, FIDIV, FIDIVR, FILD, FIMUL, FINCSTP, FINIT, FIST, FISTP, FISUB, FISUBR, FLD, FLD1, FLDCW, FLDENV, FLDENVW, FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI, FLDZ, FMUL, FMULP, FNCLEX, FNDISI, FNENI, FNINIT, FNOP, FNSAVE, FNSAVEW, FNSTCW, FNSTENV, FNSTENVW, FNSTSW, FPATAN, FPREM, FPTAN, FRNDINT, FRSTOR, FRSTORW, FSAVE, FSAVEW, FSCALE, FSQRT, FST, FSTCW, FSTENV, FSTENVW, FSTP, FSTSW, FSUB, FSUBP, FSUBR, FSUBRP, FTST, FWAIT, FXAM, FXCH, FXTRACT, FYL2X, FYL2XP1
Added in specific processors
Added with 80287
FSETPM
Added with 80387
FCOS, FLDENVD, FNSAVED, FNSTENVD, FPREM1, FRSTORD, FSAVED, FSIN, FSINCOS, FSTENVD, FUCOM, FUCOMP, FUCOMPP
Added with Pentium Pro
FCMOV variants: FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, FCMOVNBE, FCMOVNE, FCMOVNU, FCMOVU
FCOMI variants: FCOMI, FCOMIP, FUCOMI, FUCOMIP
Added with SSE
FXRSTOR*, FXSAVE*
- Also supported on later Pentium IIs, though they do not contain SSE support
Added with SSE3
FISTTP (x87 to integer conversion)
Undocumented instructions
FFREEP performs FFREE ST(i) and pop stack
SIMD instructions
MMX instructions (added with Pentium MMX)
EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB, PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW, PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR
MMX+ instructions
added with Athlon
Same as the SSE SIMD Integer Instructions which operated on MMX registers.
EMMX instructions
added with 6x86MX from Cyrix, deprecated now
PAVEB, PADDSIW, PMAGW, PDISTIB, PSUBSIW, PMVZB, PMULHRW, PMVNZB, PMVLZB, PMVGEZB, PMULHRIW, PMACHRIW
3DNow! instructions
added with K6-2
FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN, PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD, PMULHRW, PREFETCH, PREFETCHW
3DNow!+ instructions
added with Athlon
PF2IW, PFNACC, PFPNACC, PI2FW, PSWAPD
added with Geode GX
PFRSQRTV, PFRCPV
SSE instructions
added with Pentium III also see integer instruction added with Pentium III
SSE SIMD Floating-Point Instructions
ADDPS, ADDSS, CMPPS, CMPSS, COMISS, CVTPI2PS, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, DIVPS, DIVSS, LDMXCSR, MAXPS, MAXSS, MINPS, MINSS, MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVNTPS, MOVSS, MOVUPS, MULPS, MULSS, RCPPS, RCPSS, RSQRTPS, RSQRTSS, SHUFPS, SQRTPS, SQRTSS, STMXCSR, SUBPS, SUBSS, UCOMISS, UNPCKHPS, UNPCKLPS
SSE SIMD Integer Instructions
ANDNPS, ANDPS, ORPS, PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB, PMINSW, PMINUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW, XORPS
Instruction | Opcode | Meaning |
MOVUPS xmm1, xmm2/m128 | 0F 10 /r | Move Unaligned Packed Single-Precision Floating-Point Values |
MOVSS xmm1, xmm2/m32 | F3 0F 10 /r | Move Scalar Single-Precision Floating-Point Values |
MOVUPS xmm2/m128, xmm1 | 0F 11 /r | Move Unaligned Packed Single-Precision Floating-Point Values |
MOVSS xmm2/m32, xmm1 | F3 0F 11 /r | Move Scalar Single-Precision Floating-Point Values |
MOVLPS xmm, m64 | 0F 12 /r | Move Low Packed Single-Precision Floating-Point Values |
MOVHLPS xmm1, xmm2 | 0F 12 /r | Move Packed Single-Precision Floating-Point Values High to Low |
MOVLPS m64, xmm | 0F 13 /r | Move Low Packed Single-Precision Floating-Point Values |
UNPCKLPS xmm1, xmm2/m128 | 0F 14 /r | Unpack and Interleave Low Packed Single-Precision Floating-Point Values |
UNPCKHPS xmm1, xmm2/m128 | 0F 15 /r | Unpack and Interleave High Packed Single-Precision Floating-Point Values |
MOVHPS xmm, m64 | 0F 16 /r | Move High Packed Single-Precision Floating-Point Values |
MOVLHPS xmm1, xmm2 | 0F 16 /r | Move Packed Single-Precision Floating-Point Values Low to High |
MOVHPS m64, xmm | 0F 17 /r | Move High Packed Single-Precision Floating-Point Values |
PREFETCHNTA | 0F 18 /0 | Prefetch Data Into Caches (non-temporal data with respect to all cache levels) |
PREFETCH0 | 0F 18 /1 | Prefetch Data Into Caches (temporal data) |
PREFETCH1 | 0F 18 /2 | Prefetch Data Into Caches (temporal data with respect to first level cache) |
PREFETCH2 | 0F 18 /3 | Prefetch Data Into Caches (temporal data with respect to second level cache) |
NOP | 0F 1F /0 | No Operation |
MOVAPS xmm1, xmm2/m128 | 0F 28 /r | Move Aligned Packed Single-Precision Floating-Point Values |
MOVAPS xmm2/m128, xmm1 | 0F 29 /r | Move Aligned Packed Single-Precision Floating-Point Values |
CVTPI2PS xmm, mm/m64 | 0F 2A /r | Convert Packed Dword Integers to Packed Single-Precision FP Values |
CVTSI2SS xmm, r/m32 | F3 0F 2A /r | Convert Dword Integer to Scalar Single-Precision FP Value |
MOVNTPS m128, xmm | 0F 2B /r | Store Packed Single-Precision Floating-Point Values Using Non-Temporal Hint |
CVTTPS2PI mm, xmm/m64 | 0F 2C /r | Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers |
CVTTSS2SI r32, xmm/m32 | F3 0F 2C /r | Convert with Truncation Scalar Single-Precision FP Value to Dword Integer |
CVTPS2PI mm, xmm/m64 | 0F 2D /r | Convert Packed Single-Precision FP Values to Packed Dword Integers |
CVTSS2SI r32, xmm/m32 | F3 0F 2D /r | Convert Scalar Single-Precision FP Value to Dword Integer |
UCOMISS xmm1, xmm2/m32 | 0F 2E /r | Unordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS |
COMISS xmm1, xmm2/m32 | 0F 2F /r | Compare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS |
SQRTPS xmm1, xmm2/m128 | 0F 51 /r | Compute Square Roots of Packed Single-Precision Floating-Point Values |
SQRTSS xmm1, xmm2/m32 | F3 0F 51 /r | Compute Square Root of Scalar Single-Precision Floating-Point Value |
RSQRTPS xmm1, xmm2/m128 | 0F 52 /r | Compute Reciprocal of Square Root of Packed Single-Precision Floating-Point Value |
RSQRTSS xmm1, xmm2/m32 | F3 0F 52 /r | Compute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value |
RCPPS xmm1, xmm2/m128 | 0F 53 /r | Compute Reciprocal of Packed Single-Precision Floating-Point Values |
RCPSS xmm1, xmm2/m32 | F3 0F 53 /r | Compute Reciprocal of Scalar Single-Precision Floating-Point Values |
ANDPS xmm1, xmm2/m128 | 0F 54 /r | Bitwise Logical AND of Packed Single-Precision Floating-Point Values |
ANDNPS xmm1, xmm2/m128 | 0F 55 /r | Bitwise Logical AND NOT of Packed Single-Precision Floating-Point Values |
ORPS xmm1, xmm2/m128 | 0F 56 /r | Bitwise Logical OR of Single-Precision Floating-Point Values |
XORPS xmm1, xmm2/m128 | 0F 57 /r | Bitwise Logical XOR for Single-Precision Floating-Point Values |
ADDPS xmm1, xmm2/m128 | 0F 58 /r | Add Packed Single-Precision Floating-Point Values |
ADDSS xmm1, xmm2/m32 | F3 0F 58 /r | Add Scalar Single-Precision Floating-Point Values |
MULPS xmm1, xmm2/m128 | 0F 59 /r | Multiply Packed Single-Precision Floating-Point Values |
MULSS xmm1, xmm2/m32 | F3 0F 59 /r | Multiply Scalar Single-Precision Floating-Point Values |
SUBPS xmm1, xmm2/m128 | 0F 5C /r | Subtract Packed Single-Precision Floating-Point Values |
SUBSS xmm1, xmm2/m32 | F3 0F 5C /r | Subtract Scalar Single-Precision Floating-Point Values |
MINPS xmm1, xmm2/m128 | 0F 5D /r | Return Minimum Packed Single-Precision Floating-Point Values |
MINSS xmm1, xmm2/m32 | F3 0F 5D /r | Return Minimum Scalar Single-Precision Floating-Point Values |
DIVPS xmm1, xmm2/m128 | 0F 5E /r | Divide Packed Single-Precision Floating-Point Values |
DIVSS xmm1, xmm2/m32 | F3 0F 5E /r | Divide Scalar Single-Precision Floating-Point Values |
MAXPS xmm1, xmm2/m128 | 0F 5F /r | Return Maximum Packed Single-Precision Floating-Point Values |
MAXSS xmm1, xmm2/m32 | F3 0F 5F /r | Return Maximum Scalar Single-Precision Floating-Point Values |
PSHUFW mm1, mm2/m64, imm8 | 0F 70 /r ib | Shuffle Packed Words |
LDMXCSR m32 | 0F AE /2 | Load MXCSR Register State |
STMXCSR m32 | 0F AE /3 | Store MXCSR Register State |
SFENCE | 0F AE /7 | Store Fence |
CMPPS xmm1, xmm2/m128, imm8 | 0F C2 /r ib | Compare Packed Single-Precision Floating-Point Values |
CMPSS xmm1, xmm2/m32, imm8 | F3 0F C2 /r ib | Compare Scalar Single-Precision Floating-Point Values |
PINSRW mm, r32/m16, imm8 | 0F C4 /r | Insert Word |
PEXTRW r32, mm, imm8 | 0F C5 /r | Extract Word |
SHUFPS xmm1, xmm2/m128, imm8 | 0F C6 /r ib | Shuffle Packed Single-Precision Floating-Point Values |
PMOVMSKB r32, mm | 0F D7 /r | Move Byte Mask |
PMINUB mm1, mm2/m64 | 0F DA /r | Minimum of Packed Unsigned Byte Integers |
PMAXUB mm1, mm2/m64 | 0F DE /r | Maximum of Packed Unsigned Byte Integers |
PAVGB mm1, mm2/m64 | 0F E0 /r | Average Packed Integers |
PAVGW mm1, mm2/m64 | 0F E3 /r | Average Packed Integers |
PMULHUW mm1, mm2/m64 | 0F E4 /r | Multiply Packed Unsigned Integers and Store High Result |
MOVNTQ m64, mm | 0F E7 /r | Store of Quadword Using Non-Temporal Hint |
PMINSW mm1, mm2/m64 | 0F EA /r | Minimum of Packed Signed Word Integers |
PMAXSW mm1, mm2/m64 | 0F EE /r | Maximum of Packed Signed Word Integers |
PSADBW mm1, mm2/m64 | 0F F6 /r | Compute Sum of Absolute Differences |
MASKMOVQ mm1, mm2 | 0F F7 /r | Store Selected Bytes of Quadword |
added with Pentium 4 also see integer instructions added with Pentium 4
SSE2 SIMD Floating-Point Instructions
ADDPD, ADDSD, ANDNPD, ANDPD, CMPPD, CMPSD*, COMISD, CVTDQ2PD, CVTDQ2PS, CVTPD2DQ, CVTPD2PI, CVTPD2PS, CVTPI2PD, CVTPS2DQ, CVTPS2PD, CVTSD2SI, CVTSD2SS, CVTSI2SD, CVTSS2SD, CVTTPD2DQ, CVTTPD2PI, CVTPS2DQ, CVTTSD2SI, DIVPD, DIVSD, MAXPD, MAXSD, MINPD, MINSD, MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVSD*, MOVUPD, MULPD, MULSD, ORPD, SHUFPD, SQRTPD, SQRTSD, SUBPD, SUBSD, UCOMISD, UNPCKHPD, UNPCKLPD, XORPD
CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS), however, the former refer to scalar double-precision floating-points whereas the latters refer to doubleword strings.
SSE2 SIMD Integer Instructions
MOVDQ2Q, MOVDQA, MOVDQU, MOVQ2DQ, PADDQ, PSUBQ, PMULUDQ, PSHUFHW, PSHUFLW, PSHUFD, PSLLDQ, PSRLDQ, PUNPCKHQDQ, PUNPCKLQDQ
SSE3 instructions
added with Pentium 4 supporting SSE3 also see integer and floating-point instructions added with Pentium 4 SSE3
SSE3 SIMD Floating-Point Instructions
ADDSUBPD, ADDSUBPS (for Complex Arithmetic)
HADDPD, HADDPS, HSUBPD, HSUBPS (for Graphics)
MOVDDUP, MOVSHDUP, MOVSLDUP (for Complex Arithmetic)
SSSE3 instructions
added with Xeon 5100 series and initial Core 2
PSIGNW, PSIGND, PSIGNB
PSHUFB
PMULHRSW, PMADDUBSW
PHSUBW, PHSUBSW, PHSUBD
PHADDW, PHADDSW, PHADDD
PALIGNR
PABSW, PABSD, PABSB
SSE4 instructions
SSE4.1 | ||||
Instruction | Description | |||
MPSADBW | Compute eight offset sums of absolute differences (i.e. |x0-y0|+|x1-y1|+|x2-y2|+|x3-y3|, |x0-y1|+|x1-y2|+|x2-y3|+|x3-y4|, ...); this operation is extremely important for modern HDTV codecs, and (see [4]) allows an 8x8 block difference to be computed in fewer than seven cycles. One bit of a three-bit immediate operand indicates whether y0 .. y10 or y4 .. y14 should be used from the destination operand, the other two whether x0..x3, x4..x7, x8..x11 or x12..x15 should be used from the source. | |||
PHMINPOSUW | Sets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source. | |||
PMULDQ | Packed signed multiplication on two sets of 2 out of 4 packed integers, the 1st and 3rd per packed 4, giving 2 packed 64-bit results. | |||
PMULLD | Packed signed multiplication, 4 packed sets of 32-bit integers multiplied to give 4 packed 32-bit results. | |||
DPPS, DPPD | Dot product for AOS (Array of Structs) data. This takes an immediate operand consisting of four (or two for DPPD) bits to select which of the entries in the input to multiply and accumulate, and another four (or two for DPPD) to select whether to put 0 or the dot-product in the appropriate field of the output. | |||
BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDW | Conditional copying of elements in one location with another, based (for non-V form) on the bits in an immediate operand, and (for V form) on the bits in register XMM0. | |||
PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD | Packed minimum/maximum for different integer operand types | |||
ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD | Round values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand | |||
INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRW, PEXTRD/PEXTRQ | The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register memory location and insert it into a field in the destination register given by an immediate operand, EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location. For example, PEXTRD eax, [xmm0], 1; EXTRACTPS [addr+4*eax], xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0. | |||
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ | Packed sign/zero extension to wider types | |||
PTEST | This does the same as the TEST instruction, in that it sets the ZF and CF flags to the result of an AND between its operators ... it sets the Z flag if any of the bits matched, and the C flag if all of them did. | |||
PCMPEQQ | Quadword (64 bits) compare for equality | |||
PACKUSDW | Convert signed DWORDs into unsigned WORDs with saturation. | |||
MOVNTDQA | Efficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus. |
SSE4.2 | ||||
Instruction | Description | |||
CRC32 | Accumulate CRC32C value using the polynomial 0x11EDC6F41 (or, without the high order bit, 0x1EDC6F41). | |||
PCMPESTRI | Packed Compare Explicit Length Strings, Return Index | |||
PCMPESTRM | Packed Compare Explicit Length Strings, Return Mask | |||
PCMPISTRI | Packed Compare Implicit Length Strings, Return Index | |||
PCMPISTRM | Packed Compare Implicit Length String, Return Mask | |||
PCMPGTQ | Compare Packed Signed 64-bit data For Greater Than | |||
POPCNT | Population count (count number of bits set to 1). POPCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm POPCNT presence. |
SSE4a | ||||
Instruction | Description | |||
LZCNT | Leading Zero Count - bit manipulation. LZCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm LZCNT presence. | |||
POPCNT | Population count (count number of bits set to 1). POPCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm POPCNT presence. | |||
EXTRQ/INSERTQ | Combined mask-shift instructions. | |||
MOVNTSD/MOVNTSS | Scalar streaming store instructions. |
3DNow!
3DNow! floating-point instructions | ||||
Instruction | Description | |||
PAVGUSB | Packed 8-bit unsigned integer averaging | |||
PI2FD | Packed 32-bit integer to floating-point conversion | |||
PF2ID | Packed floating-point to 32-bit integer conversion | |||
PFCMPGE | Packed floating-point comparison, greater or equal | |||
PFCMPGT | Packed floating-point comparison, greater | |||
PFCMPEQ | Packed floating-point comparison, equal | |||
PFACC | Packed floating-point accumulate | |||
PFADD | Packed floating-point addition | |||
PFSUB | Packed floating-point subtraction | |||
PFSUBR | Packed floating-point reverse subtraction | |||
PFMIN | Packed floating-point minimum | |||
PFMAX | Packed floating-point maximum | |||
PFMUL | Packed floating-point multiplication | |||
PFRCP | Packed floating-point reciprocal approximation | |||
PFRSQRT | Packed floating-point reciprocal square root approximation | |||
PFRCPIT1 | Packed floating-point reciprocal, first iteration step | |||
PFRSQIT1 | Packed floating-point reciprocal square root, first iteration step | |||
PFRCPIT2 | Packed floating-point reciprocal/reciprocal square root, second iteration step | |||
PMULHRW | Packed 16-bit integer multiply with rounding |
3DNow! performance-enhancement instructions | ||||
Instruction | Description | |||
FEMMS | Faster entry/exit of the MMX or floating-point state | |||
PREFETCH/PREFETCHW | Prefetch at least a 32-byte line into L1 data cache |
3DNow! extension DSP instructions | ||||
Instruction | Description | |||
PF2IW | Packed floating-point to integer word conversion with sign extend | |||
PI2FW | Packed integer word to floating-point conversion | |||
PFNACC | Packed floating-point negative accumulate | |||
PFPNACC | Packed floating-point mixed positive-negative accumulate | |||
PSWAPD | Packed swap doubleword |
MMX extension instructions (Integer SSE) | ||||
Instruction | Description | |||
MASKMOVQ | Streaming (cache bypass) store using byte mask | |||
MOVNTQ | Streaming (cache bypass) store | |||
PAVGB | Packed average of unsigned byte | |||
PAVGW | Packed average of unsigned word | |||
PMAXSW | Packed maximum signed word | |||
PMAXUB | Packed maximum unsigned byte | |||
PMINSW | Packed minimum signed word | |||
PMINUB | Packed minimum unsigned byte | |||
PMULHUW | Packed multiply high unsigned word | |||
PSADBW | Packed sum of absolute byte differences | |||
PSHUFW | Packed shuffle word | |||
PEXTRW | Extract word into integer register | |||
PINSRW | Insert word from integer register | |||
PMOVMSKB | Move byte mask to integer register | |||
PREFETCHNTA | Prefetch using the NTA reference | |||
PREFETCHT0 | Prefetch using the T0 reference | |||
PREFETCHT1 | Prefetch using the T1 reference | |||
PREFETCHT2 | Prefetch using the T2 reference | |||
SFENCE | Store fence |
3DNow! Professional instructions unique to the Geode GX/LX | ||||
Instruction | Description | |||
PFRSQRTV | Reciprocal square root approximation for a pair of 32-bit floats | |||
PFRCPV | Reciprocal approximation for a pair of 32-bit floats |