x86 Instruction Set

Topics on common programming languages
Post Reply
User avatar
Neo
Site Admin
Site Admin
Posts: 2642
Joined: Wed Jul 15, 2009 2:07 am
Location: Colombo

x86 Instruction Set

Post by Neo » Thu Aug 20, 2009 9:57 pm

Original 8086/8088 instructions
InstructionMeaningNotes
AAAASCII adjust AL after additionused with unpacked binary coded decimal
AADASCII adjust AX before divisionbuggy in the original instruction set, but "fixed" in the NROBOT.LK V20, causing a number of incompatibilities
AAMASCII adjust AX after multiplication
AASASCII adjust AL after subtraction
ADCAdd with carry
ADDAdd
ANDLogical AND
CALLCall procedure
CBWConvert byte to word
CLCClear carry flag
CLDClear direction flag
CLIClear interrupt flag
CMCComplement carry flag
CMPCompare operands
CMPSBCompare bytes in memory
CMPSWCompare words
CWDConvert word to doubleword
DAADecimal adjust AL after addition(used with packed binary coded decimal)
DASDecimal adjust AL after subtraction
DECDecrement by 1
DIVUnsigned divide
ESCUsed with floating-point unit
HLTEnter halt state
IDIVSigned divide
IMULSigned multiply
INInput from port
INCIncrement by 1
INTCall to interrupt
INTOCall to interrupt if overflow
IRETReturn from interrupt
JxxJump if condition(JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ)
JMPJump
LAHFLoad flags into AH register
LDSLoad pointer using DS
LEALoad Effective Address
LESLoad ES with pointer
LOCKAssert BUS LOCK# signal(for multiprocessing)
LODSBLoad byte
LODSWLoad word
LOOP/LOOPxLoop control(LOOPE, LOOPNE, LOOPNZ, LOOPZ)
MOVMove
MOVSBMove byte from string to string
MOVSWMove word from string to string
MULUnsigned multiply
NEGTwo's complement negation
NOPNo operationopcode (0x90) equivalent to XCHG EAX, EAX
NOTNegate the operand, logical NOT
ORLogical OR
OUTOutput to port
POPPop data from stack(Only works with register CS on 8086/8088)
POPFPop data into flags register
PUSHPush data onto stack
PUSHFPush flags onto stack
RCLRotate left (with carry)
RCRRotate right (with carry)
REPxxRepeat CMPS/MOVS/SCAS/STOS(REP, REPE, REPNE, REPNZ, REPZ)
RETReturn from procedure
RETNReturn from near procedure
RETFReturn from far procedure
ROLRotate left
RORRotate right
SAHFStore AH into flags
SALShift Arithmetically left (signed shift left)
SARShift Arithmetically right (signed shift right)
SBBSubtraction with borrow
SCASBCompare byte string
SCASWCompare word string
SHLShift left (unsigned shift left)
SHRShift right (unsigned shift right)
STCSet carry flag
STDSet direction flag
STISet interrupt flag
STOSBStore byte in string
STOSWStore word in string
SUBSubtraction
TESTLogical compare (AND)
WAITWait until not busyWaits until BUSY# pin is inactive (used with floating-point unit)
XCHGExchange data
XLATTable look-up translation
XORExclusive OR
Added with 80186/80188
InstructionMeaning
BOUNDCheck array index against bounds
ENTEREnter stack frame
INSInput from port to string
LEAVELeave stack frame
OUTSOutput string to port
POPAPop all general purpose registers from stack
PUSHAPush all general purpose registers onto stack
Added with 80286
InstructionMeaning
ARPLAdjust RPL field of selector
CLTSClear task-switched flag in register CR0
LARLoad access rights byte
LGDTLoad global descriptor table
LIDTLoad interrupt descriptor table
LLDTLoad local descriptor table
LMSWLoad machine status word
LOADALLLoad all CPU registers, including internal ones such as GDT
LSLLoad segment limit
LTRLoad task register
SGDTStore global descriptor table
SIDTStore interrupt descriptor table
SLDTStore local descriptor table
SMSWStore machine status word
STRStore task register
VERRVerify a segment for reading
VERWVerify a segment for writing
Added with 80386
InstructionMeaningNotes
BSFBit scan forward
BSRBit scan reverse
BTBit test
BTCBit test and complement
BTRBit test and reset
BTSBit test and set
CDQConvert double-word to quad-wordSign-extends EAX into EDX, forming the quad-word EDX:EAX. Since (I)DIV uses EDX:EAX as its input, CDQ must be called after setting EAX if EDX is not manually initialized (as in 64/32 division) before (I)DIV
CMPSDCompare string double-wordCompares ES:[(E)DI] with DS:[SI]
CWDEConvert word to double-wordUnlike CWD, CWDE sign-extends AX to EAX instead of AX to DX:AX
INSB, INSW, INSDInput from port to string with explicit sizesame as INS
IRETxInterrupt return; D suffix means 32-bit return, F suffix means do not generate epilogue code (i.e. LEAVE instruction)Use IRETD rather than IRET in 32-bit situations
JCXZ, JECXZJump if register (E)CX is zero
LFS, LGSLoad far pointer
LSSLoad stack segment
LODSW, LODSDLoad stringcan be prefixed with REP
LOOPW, LOOPDLoopLoop; counter register is (E)CX
LOOPEW, LOOPEDLoop while equal
LOOPZW, LOOPZDLoop while zero
LOOPNEW, LOOPNEDLoop while not equal
LOOPNZW, LOOPNZDLoop while not zero
MOVSW, MOVSDMove data from string to string
MOVSXMove with sign-extend
MOVZXMove with zero-extend
POPADPop all double-word (32-bit) registers from stackDoes not pop register ESP off of stack
POPFDPop data into EFLAGS register
PUSHADPush all double-word (32-bit) registers onto stack
PUSHFDPush EFLAGS register onto stack
SCASDScan string data double-word
SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETNA, SETNAE, SETNB, SETNBE, SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, SETNZ, SETO, SETP, SETPE, SETPO, SETS, SETZSet byte to one on condition
SHLDShift left double-word
SHRDShift right double-word
STOSxStore string
Added with 80486
InstructionMeaningNotes
BSWAPByte SwapOnly works for 32 bit registers
CMPXCHGCoMPare and eXCHanGe
INVDInvalidate Internal Caches
INVLPGInvalidate TLB Entry
WBINVDWrite Back and Invalidate Cache
XADDExchange and Add
Added with Pentium
InstructionMeaningNotes
CPUIDCPU IDentification*See note below
CMPXCHG8BCoMPare and eXCHanGe 8 bytes
RDMSRReaD from Model-Specific Register
RDTSCReaD Time Stamp Counter
WRMSRWRite to Model-Specific Register
RSMResume operation of interrupted programSMM [System Management Mode]
*The CPUID instruction was fully introduced with the Pentium processor. It was also added to later 80486 processors.

Added with Pentium MMX
InstructionMeaningNotes
RDPMCRead the PMC [Performance Monitoring Counter]Specified in the ECX register into registers EDX:EAX

Added with Pentium Pro
Conditional MOV: CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE, CMOVL, CMOVLE, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC, CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP, CMOVNS, CMOVNZ, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ, SYSENTER (SYStem call ENTER), SYSEXIT (SYStem call EXIT), RDPMC*, UD2
- RDPMC was introduced in the Pentium Pro processor and the Pentium processor with MMX technology

Added with AMD K6-2
SYSCALL, SYSRET (functionally equivalent to SYSENTER and SYSEXIT)

Added with SSE
MASKMOVQ, MOVNTPS, MOVNTQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE (for Cacheability and Memory Ordering)

Added with SSE2
CLFLUSH, LFENCE, MASKMOVDQU, MFENCE, MOVNTDQ, MOVNTI, MOVNTPD, PAUSE (for Cacheability)

Added with SSE3
LDDQU (for Video Encoding)
MONITOR, MWAIT (for thread synchronization; only on processors supporting Hyper-threading and some dual-core processors like Core 2, Phenom and others)

Added with Intel VT
VMPTRLD, VMPTRST, VMCLEAR, VMREAD, VMWRITE, VMCALL, VMLAUNCH, VMRESUME, VMXOFF, VMXON

Added with AMD-V
CLGI, SKINIT, STGI, VMLOAD, VMMCALL, VMRUN, VMSAVE (SVM instructions of AMD-V)

Added with x86-64
CMPXCHG16B (CoMPaRe and eXCHanGe 16 bytes), RDTSCP (ReaD Time Stamp Counter and Processor ID)

Added with SSE4a
LZCNT, POPCNT (POPulation CouNT) - advanced bit manipulation

x87 floating-point instructions

Original 8087 instructions

F2XM1, FABS, FADD, FADDP, FBLD, FBSTP, FCHS, FCLEX, FCOM, FCOMP, FCOMPP, FDECSTP, FDISI, FDIV, FDIVP, FDIVR, FDIVRP, FENI, FFREE, FIADD, FICOM, FICOMP, FIDIV, FIDIVR, FILD, FIMUL, FINCSTP, FINIT, FIST, FISTP, FISUB, FISUBR, FLD, FLD1, FLDCW, FLDENV, FLDENVW, FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI, FLDZ, FMUL, FMULP, FNCLEX, FNDISI, FNENI, FNINIT, FNOP, FNSAVE, FNSAVEW, FNSTCW, FNSTENV, FNSTENVW, FNSTSW, FPATAN, FPREM, FPTAN, FRNDINT, FRSTOR, FRSTORW, FSAVE, FSAVEW, FSCALE, FSQRT, FST, FSTCW, FSTENV, FSTENVW, FSTP, FSTSW, FSUB, FSUBP, FSUBR, FSUBRP, FTST, FWAIT, FXAM, FXCH, FXTRACT, FYL2X, FYL2XP1

Added in specific processors

Added with 80287
FSETPM

Added with 80387
FCOS, FLDENVD, FNSAVED, FNSTENVD, FPREM1, FRSTORD, FSAVED, FSIN, FSINCOS, FSTENVD, FUCOM, FUCOMP, FUCOMPP

Added with Pentium Pro
FCMOV variants: FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, FCMOVNBE, FCMOVNE, FCMOVNU, FCMOVU
FCOMI variants: FCOMI, FCOMIP, FUCOMI, FUCOMIP

Added with SSE
FXRSTOR*, FXSAVE*
- Also supported on later Pentium IIs, though they do not contain SSE support

Added with SSE3
FISTTP (x87 to integer conversion)

Undocumented instructions
FFREEP performs FFREE ST(i) and pop stack


SIMD instructions

MMX instructions (added with Pentium MMX)
EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB, PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW, PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR

MMX+ instructions

added with Athlon

Same as the SSE SIMD Integer Instructions which operated on MMX registers.

EMMX instructions

added with 6x86MX from Cyrix, deprecated now

PAVEB, PADDSIW, PMAGW, PDISTIB, PSUBSIW, PMVZB, PMULHRW, PMVNZB, PMVLZB, PMVGEZB, PMULHRIW, PMACHRIW

3DNow! instructions

added with K6-2

FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE, PFCMPGT, PFMAX, PFMIN, PFMUL, PFRCP, PFRCPIT1, PFRCPIT2, PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD, PMULHRW, PREFETCH, PREFETCHW

3DNow!+ instructions

added with Athlon

PF2IW, PFNACC, PFPNACC, PI2FW, PSWAPD

added with Geode GX

PFRSQRTV, PFRCPV

SSE instructions

added with Pentium III also see integer instruction added with Pentium III

SSE SIMD Floating-Point Instructions

ADDPS, ADDSS, CMPPS, CMPSS, COMISS, CVTPI2PS, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, DIVPS, DIVSS, LDMXCSR, MAXPS, MAXSS, MINPS, MINSS, MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVNTPS, MOVSS, MOVUPS, MULPS, MULSS, RCPPS, RCPSS, RSQRTPS, RSQRTSS, SHUFPS, SQRTPS, SQRTSS, STMXCSR, SUBPS, SUBSS, UCOMISS, UNPCKHPS, UNPCKLPS

SSE SIMD Integer Instructions

ANDNPS, ANDPS, ORPS, PAVGB, PAVGW, PEXTRW, PINSRW, PMAXSW, PMAXUB, PMINSW, PMINUB, PMOVMSKB, PMULHUW, PSADBW, PSHUFW, XORPS
InstructionOpcodeMeaning
MOVUPS xmm1, xmm2/m1280F 10 /rMove Unaligned Packed Single-Precision Floating-Point Values
MOVSS xmm1, xmm2/m32F3 0F 10 /rMove Scalar Single-Precision Floating-Point Values
MOVUPS xmm2/m128, xmm10F 11 /rMove Unaligned Packed Single-Precision Floating-Point Values
MOVSS xmm2/m32, xmm1F3 0F 11 /rMove Scalar Single-Precision Floating-Point Values
MOVLPS xmm, m640F 12 /rMove Low Packed Single-Precision Floating-Point Values
MOVHLPS xmm1, xmm20F 12 /rMove Packed Single-Precision Floating-Point Values High to Low
MOVLPS m64, xmm0F 13 /rMove Low Packed Single-Precision Floating-Point Values
UNPCKLPS xmm1, xmm2/m1280F 14 /rUnpack and Interleave Low Packed Single-Precision Floating-Point Values
UNPCKHPS xmm1, xmm2/m1280F 15 /rUnpack and Interleave High Packed Single-Precision Floating-Point Values
MOVHPS xmm, m640F 16 /rMove High Packed Single-Precision Floating-Point Values
MOVLHPS xmm1, xmm20F 16 /rMove Packed Single-Precision Floating-Point Values Low to High
MOVHPS m64, xmm0F 17 /rMove High Packed Single-Precision Floating-Point Values
PREFETCHNTA0F 18 /0Prefetch Data Into Caches (non-temporal data with respect to all cache levels)
PREFETCH00F 18 /1Prefetch Data Into Caches (temporal data)
PREFETCH10F 18 /2Prefetch Data Into Caches (temporal data with respect to first level cache)
PREFETCH20F 18 /3Prefetch Data Into Caches (temporal data with respect to second level cache)
NOP0F 1F /0No Operation
MOVAPS xmm1, xmm2/m1280F 28 /rMove Aligned Packed Single-Precision Floating-Point Values
MOVAPS xmm2/m128, xmm10F 29 /rMove Aligned Packed Single-Precision Floating-Point Values
CVTPI2PS xmm, mm/m640F 2A /rConvert Packed Dword Integers to Packed Single-Precision FP Values
CVTSI2SS xmm, r/m32F3 0F 2A /rConvert Dword Integer to Scalar Single-Precision FP Value
MOVNTPS m128, xmm0F 2B /rStore Packed Single-Precision Floating-Point Values Using Non-Temporal Hint
CVTTPS2PI mm, xmm/m640F 2C /rConvert with Truncation Packed Single-Precision FP Values to Packed Dword Integers
CVTTSS2SI r32, xmm/m32F3 0F 2C /rConvert with Truncation Scalar Single-Precision FP Value to Dword Integer
CVTPS2PI mm, xmm/m640F 2D /rConvert Packed Single-Precision FP Values to Packed Dword Integers
CVTSS2SI r32, xmm/m32F3 0F 2D /rConvert Scalar Single-Precision FP Value to Dword Integer
UCOMISS xmm1, xmm2/m320F 2E /rUnordered Compare Scalar Single-Precision Floating-Point Values and Set EFLAGS
COMISS xmm1, xmm2/m320F 2F /rCompare Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS
SQRTPS xmm1, xmm2/m1280F 51 /rCompute Square Roots of Packed Single-Precision Floating-Point Values
SQRTSS xmm1, xmm2/m32F3 0F 51 /rCompute Square Root of Scalar Single-Precision Floating-Point Value
RSQRTPS xmm1, xmm2/m1280F 52 /rCompute Reciprocal of Square Root of Packed Single-Precision Floating-Point Value
RSQRTSS xmm1, xmm2/m32F3 0F 52 /rCompute Reciprocal of Square Root of Scalar Single-Precision Floating-Point Value
RCPPS xmm1, xmm2/m1280F 53 /rCompute Reciprocal of Packed Single-Precision Floating-Point Values
RCPSS xmm1, xmm2/m32F3 0F 53 /rCompute Reciprocal of Scalar Single-Precision Floating-Point Values
ANDPS xmm1, xmm2/m1280F 54 /rBitwise Logical AND of Packed Single-Precision Floating-Point Values
ANDNPS xmm1, xmm2/m1280F 55 /rBitwise Logical AND NOT of Packed Single-Precision Floating-Point Values
ORPS xmm1, xmm2/m1280F 56 /rBitwise Logical OR of Single-Precision Floating-Point Values
XORPS xmm1, xmm2/m1280F 57 /rBitwise Logical XOR for Single-Precision Floating-Point Values
ADDPS xmm1, xmm2/m1280F 58 /rAdd Packed Single-Precision Floating-Point Values
ADDSS xmm1, xmm2/m32F3 0F 58 /rAdd Scalar Single-Precision Floating-Point Values
MULPS xmm1, xmm2/m1280F 59 /rMultiply Packed Single-Precision Floating-Point Values
MULSS xmm1, xmm2/m32F3 0F 59 /rMultiply Scalar Single-Precision Floating-Point Values
SUBPS xmm1, xmm2/m1280F 5C /rSubtract Packed Single-Precision Floating-Point Values
SUBSS xmm1, xmm2/m32F3 0F 5C /rSubtract Scalar Single-Precision Floating-Point Values
MINPS xmm1, xmm2/m1280F 5D /rReturn Minimum Packed Single-Precision Floating-Point Values
MINSS xmm1, xmm2/m32F3 0F 5D /rReturn Minimum Scalar Single-Precision Floating-Point Values
DIVPS xmm1, xmm2/m1280F 5E /rDivide Packed Single-Precision Floating-Point Values
DIVSS xmm1, xmm2/m32F3 0F 5E /rDivide Scalar Single-Precision Floating-Point Values
MAXPS xmm1, xmm2/m1280F 5F /rReturn Maximum Packed Single-Precision Floating-Point Values
MAXSS xmm1, xmm2/m32F3 0F 5F /rReturn Maximum Scalar Single-Precision Floating-Point Values
PSHUFW mm1, mm2/m64, imm80F 70 /r ibShuffle Packed Words
LDMXCSR m320F AE /2Load MXCSR Register State
STMXCSR m320F AE /3Store MXCSR Register State
SFENCE0F AE /7Store Fence
CMPPS xmm1, xmm2/m128, imm80F C2 /r ibCompare Packed Single-Precision Floating-Point Values
CMPSS xmm1, xmm2/m32, imm8F3 0F C2 /r ibCompare Scalar Single-Precision Floating-Point Values
PINSRW mm, r32/m16, imm80F C4 /rInsert Word
PEXTRW r32, mm, imm80F C5 /rExtract Word
SHUFPS xmm1, xmm2/m128, imm80F C6 /r ibShuffle Packed Single-Precision Floating-Point Values
PMOVMSKB r32, mm0F D7 /rMove Byte Mask
PMINUB mm1, mm2/m640F DA /rMinimum of Packed Unsigned Byte Integers
PMAXUB mm1, mm2/m640F DE /rMaximum of Packed Unsigned Byte Integers
PAVGB mm1, mm2/m640F E0 /rAverage Packed Integers
PAVGW mm1, mm2/m640F E3 /rAverage Packed Integers
PMULHUW mm1, mm2/m640F E4 /rMultiply Packed Unsigned Integers and Store High Result
MOVNTQ m64, mm0F E7 /rStore of Quadword Using Non-Temporal Hint
PMINSW mm1, mm2/m640F EA /rMinimum of Packed Signed Word Integers
PMAXSW mm1, mm2/m640F EE /rMaximum of Packed Signed Word Integers
PSADBW mm1, mm2/m640F F6 /rCompute Sum of Absolute Differences
MASKMOVQ mm1, mm20F F7 /rStore Selected Bytes of Quadword
SSE2 instructions

added with Pentium 4 also see integer instructions added with Pentium 4

SSE2 SIMD Floating-Point Instructions
ADDPD, ADDSD, ANDNPD, ANDPD, CMPPD, CMPSD*, COMISD, CVTDQ2PD, CVTDQ2PS, CVTPD2DQ, CVTPD2PI, CVTPD2PS, CVTPI2PD, CVTPS2DQ, CVTPS2PD, CVTSD2SI, CVTSD2SS, CVTSI2SD, CVTSS2SD, CVTTPD2DQ, CVTTPD2PI, CVTPS2DQ, CVTTSD2SI, DIVPD, DIVSD, MAXPD, MAXSD, MINPD, MINSD, MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVSD*, MOVUPD, MULPD, MULSD, ORPD, SHUFPD, SQRTPD, SQRTSD, SUBPD, SUBSD, UCOMISD, UNPCKHPD, UNPCKLPD, XORPD
CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS), however, the former refer to scalar double-precision floating-points whereas the latters refer to doubleword strings.

SSE2 SIMD Integer Instructions
MOVDQ2Q, MOVDQA, MOVDQU, MOVQ2DQ, PADDQ, PSUBQ, PMULUDQ, PSHUFHW, PSHUFLW, PSHUFD, PSLLDQ, PSRLDQ, PUNPCKHQDQ, PUNPCKLQDQ

SSE3 instructions
added with Pentium 4 supporting SSE3 also see integer and floating-point instructions added with Pentium 4 SSE3

SSE3 SIMD Floating-Point Instructions
ADDSUBPD, ADDSUBPS (for Complex Arithmetic)
HADDPD, HADDPS, HSUBPD, HSUBPS (for Graphics)
MOVDDUP, MOVSHDUP, MOVSLDUP (for Complex Arithmetic)

SSSE3 instructions

added with Xeon 5100 series and initial Core 2
PSIGNW, PSIGND, PSIGNB
PSHUFB
PMULHRSW, PMADDUBSW
PHSUBW, PHSUBSW, PHSUBD
PHADDW, PHADDSW, PHADDD
PALIGNR
PABSW, PABSD, PABSB

SSE4 instructions
SSE4.1
InstructionDescription
MPSADBWCompute eight offset sums of absolute differences (i.e. |x0-y0|+|x1-y1|+|x2-y2|+|x3-y3|, |x0-y1|+|x1-y2|+|x2-y3|+|x3-y4|, ...); this operation is extremely important for modern HDTV codecs, and (see [4]) allows an 8x8 block difference to be computed in fewer than seven cycles. One bit of a three-bit immediate operand indicates whether y0 .. y10 or y4 .. y14 should be used from the destination operand, the other two whether x0..x3, x4..x7, x8..x11 or x12..x15 should be used from the source.
PHMINPOSUWSets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source.
PMULDQPacked signed multiplication on two sets of 2 out of 4 packed integers, the 1st and 3rd per packed 4, giving 2 packed 64-bit results.
PMULLDPacked signed multiplication, 4 packed sets of 32-bit integers multiplied to give 4 packed 32-bit results.
DPPS, DPPDDot product for AOS (Array of Structs) data. This takes an immediate operand consisting of four (or two for DPPD) bits to select which of the entries in the input to multiply and accumulate, and another four (or two for DPPD) to select whether to put 0 or the dot-product in the appropriate field of the output.
BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDWConditional copying of elements in one location with another, based (for non-V form) on the bits in an immediate operand, and (for V form) on the bits in register XMM0.
PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSDPacked minimum/maximum for different integer operand types
ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSDRound values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand
INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRW, PEXTRD/PEXTRQThe INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register memory location and insert it into a field in the destination register given by an immediate operand, EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location. For example, PEXTRD eax, [xmm0], 1; EXTRACTPS [addr+4*eax], xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0.
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQPacked sign/zero extension to wider types
PTESTThis does the same as the TEST instruction, in that it sets the ZF and CF flags to the result of an AND between its operators ... it sets the Z flag if any of the bits matched, and the C flag if all of them did.
PCMPEQQQuadword (64 bits) compare for equality
PACKUSDWConvert signed DWORDs into unsigned WORDs with saturation.
MOVNTDQAEfficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus.
SSE4.2
InstructionDescription
CRC32Accumulate CRC32C value using the polynomial 0x11EDC6F41 (or, without the high order bit, 0x1EDC6F41).
PCMPESTRIPacked Compare Explicit Length Strings, Return Index
PCMPESTRMPacked Compare Explicit Length Strings, Return Mask
PCMPISTRIPacked Compare Implicit Length Strings, Return Index
PCMPISTRMPacked Compare Implicit Length String, Return Mask
PCMPGTQCompare Packed Signed 64-bit data For Greater Than
POPCNTPopulation count (count number of bits set to 1). POPCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm POPCNT presence.
SSE4a
InstructionDescription
LZCNTLeading Zero Count - bit manipulation. LZCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm LZCNT presence.
POPCNTPopulation count (count number of bits set to 1). POPCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm POPCNT presence.
EXTRQ/INSERTQCombined mask-shift instructions.
MOVNTSD/MOVNTSSScalar streaming store instructions.

3DNow!
3DNow! floating-point instructions
InstructionDescription
PAVGUSBPacked 8-bit unsigned integer averaging
PI2FDPacked 32-bit integer to floating-point conversion
PF2IDPacked floating-point to 32-bit integer conversion
PFCMPGEPacked floating-point comparison, greater or equal
PFCMPGTPacked floating-point comparison, greater
PFCMPEQPacked floating-point comparison, equal
PFACCPacked floating-point accumulate
PFADDPacked floating-point addition
PFSUBPacked floating-point subtraction
PFSUBRPacked floating-point reverse subtraction
PFMINPacked floating-point minimum
PFMAXPacked floating-point maximum
PFMULPacked floating-point multiplication
PFRCPPacked floating-point reciprocal approximation
PFRSQRTPacked floating-point reciprocal square root approximation
PFRCPIT1Packed floating-point reciprocal, first iteration step
PFRSQIT1Packed floating-point reciprocal square root, first iteration step
PFRCPIT2Packed floating-point reciprocal/reciprocal square root, second iteration step
PMULHRWPacked 16-bit integer multiply with rounding
3DNow! performance-enhancement instructions
InstructionDescription
FEMMSFaster entry/exit of the MMX or floating-point state
PREFETCH/PREFETCHWPrefetch at least a 32-byte line into L1 data cache
3DNow! extension DSP instructions
InstructionDescription
PF2IWPacked floating-point to integer word conversion with sign extend
PI2FWPacked integer word to floating-point conversion
PFNACCPacked floating-point negative accumulate
PFPNACCPacked floating-point mixed positive-negative accumulate
PSWAPDPacked swap doubleword
MMX extension instructions (Integer SSE)
InstructionDescription
MASKMOVQStreaming (cache bypass) store using byte mask
MOVNTQStreaming (cache bypass) store
PAVGBPacked average of unsigned byte
PAVGWPacked average of unsigned word
PMAXSWPacked maximum signed word
PMAXUBPacked maximum unsigned byte
PMINSWPacked minimum signed word
PMINUBPacked minimum unsigned byte
PMULHUWPacked multiply high unsigned word
PSADBWPacked sum of absolute byte differences
PSHUFWPacked shuffle word
PEXTRWExtract word into integer register
PINSRWInsert word from integer register
PMOVMSKBMove byte mask to integer register
PREFETCHNTAPrefetch using the NTA reference
PREFETCHT0Prefetch using the T0 reference
PREFETCHT1Prefetch using the T1 reference
PREFETCHT2Prefetch using the T2 reference
SFENCEStore fence
3DNow! Professional instructions unique to the Geode GX/LX
InstructionDescription
PFRSQRTVReciprocal square root approximation for a pair of 32-bit floats
PFRCPVReciprocal approximation for a pair of 32-bit floats
Post Reply

Return to “.Net & Other Programming”