Improve this page Quickly fork, edit online, and submit a pull request for this page. Requires a signed-in GitHub account. This works well for small changes. If you'd like to make larger changes you may want to consider using local clone. Page wiki View or edit the community-maintained wiki page associated with this page.

Inline Assembler

Some Assembly Required

D, being a systems programming language, provides an inline assembler. The inline assembler is standardized for D implementations across the same CPU family, for example, the Intel Pentium inline assembler for a Win32 D compiler will be syntax compatible with the inline assembler for Linux running on an Intel Pentium.

Implementations of D on different architectures, however, are free to innovate upon the memory model, function call/return conventions, argument passing conventions, etc.

This document describes the x86 and x86_64 implementations of the inline assembler. The inline assembler platform support that a compiler provides is indicated by the D_InlineAsm_X86 and D_InlineAsm_X86_64 version identifiers, respectively.

AsmInstruction:
    Identifier : AsmInstruction
    align IntegerExpression
    even
    naked
    db Operands
    ds Operands
    di Operands
    dl Operands
    df Operands
    dd Operands
    de Operands
    Opcode
    Opcode Operands

Operands:
    Operand
    Operand , Operands

Labels

Assembler instructions can be labeled just like other statements. They can be the target of goto statements. For example:

void *pc;
asm
{
  call L1          ;
 L1:               ;
  pop  EBX         ;
  mov  pc[EBP],EBX ; // pc now points to code at L1
}

align IntegerExpression

IntegerExpression:
    IntegerLiteral
    Identifier

Causes the assembler to emit NOP instructions to align the next assembler instruction on an IntegerExpression boundary. IntegerExpression must evaluate at compile time to an integer that is a power of 2.

Aligning the start of a loop body can sometimes have a dramatic effect on the execution speed.

even

Causes the assembler to emit NOP instructions to align the next assembler instruction on an even boundary.

naked

Causes the compiler to not generate the function prolog and epilog sequences. This means such is the responsibility of inline assembly programmer, and is normally used when the entire function is to be written in assembler.

db, ds, di, dl, df, dd, de

These pseudo ops are for inserting raw data directly into the code. db is for bytes, ds is for 16 bit words, di is for 32 bit words, dl is for 64 bit words, df is for 32 bit floats, dd is for 64 bit doubles, and de is for 80 bit extended reals. Each can have multiple operands. If an operand is a string literal, it is as if there were length operands, where length is the number of characters in the string. One character is used per operand. For example:
asm
{
  db 5,6,0x83;   // insert bytes 0x05, 0x06, and 0x83 into code
  ds 0x1234;     // insert bytes 0x34, 0x12
  di 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00
  dl 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
  df 1.234;      // insert float 1.234
  dd 1.234;      // insert double 1.234
  de 1.234;      // insert real 1.234
  db "abc";      // insert bytes 0x61, 0x62, and 0x63
  ds "abc";      // insert bytes 0x61, 0x00, 0x62, 0x00, 0x63, 0x00
}

Opcodes

A list of supported opcodes is at the end.

The following registers are supported. Register names are always in upper case.

Register:
    AL AH AX EAX
    BL BH BX EBX
    CL CH CX ECX
    DL DH DX EDX
    BP EBP
    SP ESP
    DI EDI
    SI ESI
    ES CS SS DS GS FS
    CR0 CR2 CR3 CR4
    DR0 DR1 DR2 DR3 DR6 DR7
    TR3 TR4 TR5 TR6 TR7
    ST
    ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)
    MM0  MM1  MM2  MM3  MM4  MM5  MM6  MM7
    XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7

x86_64 adds these additional registers.

Register64:
    RAX  RBX  RCX  RDX
    BPL  RBP
    SPL  RSP
    DIL  RDI
    SIL  RSI
    R8B  R8W  R8D  R8
    R9B  R9W  R9D  R9
    R10B R10W R10D R10
    R11B R11W R11D R11
    R12B R12W R12D R12
    R13B R13W R13D R13
    R14B R14W R14D R14
    R15B R15W R15D R15
    XMM8 XMM9 XMM10 XMM11 XMM12 XMM13 XMM14 XMM15
    YMM0 YMM1 YMM2  YMM3  YMM4  YMM5  YMM6  YMM7
    YMM8 YMM9 YMM10 YMM11 YMM12 YMM13 YMM14 YMM15

Special Cases

lock, rep, repe, repne, repnz, repz
These prefix instructions do not appear in the same statement as the instructions they prefix; they appear in their own statement. For example:
asm {
  rep   ;
  movsb ;
}
pause
This opcode is not supported by the assembler, instead use
asm {
  rep  ;
  nop  ;
}
which produces the same result.
floating point ops
Use the two operand form of the instruction format;
fdiv ST(1);	// wrong
fmul ST;        // wrong
fdiv ST,ST(1);	// right
fmul ST,ST(0);	// right

Operands

Operand:
    AsmExp

AsmExp:
    AsmLogOrExp
    AsmLogOrExp ? AsmExp : AsmExp

AsmLogOrExp:
    AsmLogAndExp
    AsmLogAndExp || AsmLogAndExp

AsmLogAndExp:
    AsmOrExp
    AsmOrExp && AsmOrExp

AsmOrExp:
    AsmXorExp
    AsmXorExp | AsmXorExp

AsmXorExp:
    AsmAndExp
    AsmAndExp ^ AsmAndExp

AsmAndExp:
    AsmEqualExp
    AsmEqualExp & AsmEqualExp

AsmEqualExp:
    AsmRelExp
    AsmRelExp == AsmRelExp
    AsmRelExp != AsmRelExp

AsmRelExp:
    AsmShiftExp
    AsmShiftExp < AsmShiftExp
    AsmShiftExp <= AsmShiftExp
    AsmShiftExp > AsmShiftExp
    AsmShiftExp >= AsmShiftExp

AsmShiftExp:
    AsmAddExp
    AsmAddExp << AsmAddExp
    AsmAddExp >> AsmAddExp
    AsmAddExp >>> AsmAddExp

AsmAddExp:
    AsmMulExp
    AsmMulExp + AsmMulExp
    AsmMulExp - AsmMulExp

AsmMulExp:
    AsmBrExp
    AsmBrExp * AsmBrExp
    AsmBrExp / AsmBrExp
    AsmBrExp % AsmBrExp

AsmBrExp:
    AsmUnaExp
    AsmBrExp [ AsmExp ]

AsmUnaExp:
    AsmTypePrefix AsmExp
    offsetof AsmExp
    seg AsmExp
    + AsmUnaExp
    - AsmUnaExp
    ! AsmUnaExp
    ~ AsmUnaExp
    AsmPrimaryExp

AsmPrimaryExp:
    IntegerLiteral
    FloatLiteral
    __LOCAL_SIZE
    $
    Register
    Register64
    DotIdentifier

DotIdentifier:
    Identifier
    Identifier . DotIdentifier

The operand syntax more or less follows the Intel CPU documentation conventions. In particular, the convention is that for two operand instructions the source is the right operand and the destination is the left operand. The syntax differs from that of Intel's in order to be compatible with the D language tokenizer and to simplify parsing.

The seg means load the segment number that the symbol is in. This is not relevant for flat model code. Instead, do a move from the relevant segment register.

Operand Types

AsmTypePrefix:
    near ptr
    far ptr
    byte ptr
    short ptr
    int ptr
    word ptr
    dword ptr
    qword ptr
    float ptr
    double ptr
    real ptr

In cases where the operand size is ambiguous, as in:

add	[EAX],3		;

it can be disambiguated by using an AsmTypePrefix:

add  byte ptr [EAX],3 ;
add  int ptr [EAX],7  ;

far ptr is not relevant for flat model code.

Struct/Union/Class Member Offsets

To access members of an aggregate, given a pointer to the aggregate is in a register, use the .offsetof property of the qualified name of the member:

struct Foo { int a,b,c; }
int bar(Foo *f) {
  asm {
    mov EBX,f                   ;
    mov EAX,Foo.b.offsetof[EBX] ;
  }
}
void main() {
    Foo f = Foo(0, 2, 0);
    assert(bar(&f) == 2);
}

Alternatively, inside the scope of an aggregate, only the member name is needed:

struct Foo { // or class
  int a,b,c;
  int bar() {
    asm {
      mov EBX, this   ;
      mov EAX, b[EBX] ;
    }
  }
}
void main() {
    Foo f = Foo(0, 2, 0);
    assert(f.bar() == 2);
}

Stack Variables

Stack variables (variables local to a function and allocated on the stack) are accessed via the name of the variable indexed by EBP:

int foo(int x) {
  asm {
    mov EAX,x[EBP] ; // loads value of parameter x into EAX
    mov EAX,x      ; // does the same thing
  }
}

If the [EBP] is omitted, it is assumed for local variables. If naked is used, this no longer holds.

Special Symbols

$
Represents the program counter of the start of the next instruction. So,
jmp  $  ;
branches to the instruction following the jmp instruction. The $ can only appear as the target of a jmp or call instruction.
__LOCAL_SIZE
This gets replaced by the number of local bytes in the local stack frame. It is most handy when the naked is invoked and a custom stack frame is programmed.

Opcodes Supported

Opcodes
aaaaadaamaasadc
add addpd addps addsd addss
and andnpd andnps andpd andps
arpl bound bsf bsr bswap
bt btc btr bts call
cbw cdq clc cld clflush
cli clts cmc cmova cmovae
cmovb cmovbe cmovc cmove cmovg
cmovge cmovl cmovle cmovna cmovnae
cmovnb cmovnbe cmovnc cmovne cmovng
cmovnge cmovnl cmovnle cmovno cmovnp
cmovns cmovnz cmovo cmovp cmovpe
cmovpo cmovs cmovz cmp cmppd
cmpps cmps cmpsb cmpsd cmpss
cmpsw cmpxch8b cmpxchg comisd comiss
cpuid cvtdq2pd cvtdq2ps cvtpd2dq cvtpd2pi
cvtpd2ps cvtpi2pd cvtpi2ps cvtps2dq cvtps2pd
cvtps2pi cvtsd2si cvtsd2ss cvtsi2sd cvtsi2ss
cvtss2sd cvtss2si cvttpd2dq cvttpd2pi cvttps2dq
cvttps2pi cvttsd2si cvttss2si cwd cwde
da daa das db dd
de dec df di div
divpd divps divsd divss dl
dq ds dt dw emms
enter f2xm1 fabs fadd faddp
fbld fbstp fchs fclex fcmovb
fcmovbe fcmove fcmovnb fcmovnbe fcmovne
fcmovnu fcmovu fcom fcomi fcomip
fcomp fcompp fcos fdecstp fdisi
fdiv fdivp fdivr fdivrp feni
ffree fiadd ficom ficomp fidiv
fidivr fild fimul fincstp finit
fist fistp fisub fisubr fld
fld1 fldcw fldenv fldl2e fldl2t
fldlg2 fldln2 fldpi fldz fmul
fmulp fnclex fndisi fneni fninit
fnop fnsave fnstcw fnstenv fnstsw
fpatan fprem fprem1 fptan frndint
frstor fsave fscale fsetpm fsin
fsincos fsqrt fst fstcw fstenv
fstp fstsw fsub fsubp fsubr
fsubrp ftst fucom fucomi fucomip
fucomp fucompp fwait fxam fxch
fxrstor fxsave fxtract fyl2x fyl2xp1
hlt idiv imul in inc
ins insb insd insw int
into invd invlpg iret iretd
ja jae jb jbe jc
jcxz je jecxz jg jge
jl jle jmp jna jnae
jnb jnbe jnc jne jng
jnge jnl jnle jno jnp
jns jnz jo jp jpe
jpo js jz lahf lar
ldmxcsr lds lea leave les
lfence lfs lgdt lgs lidt
lldt lmsw lock lods lodsb
lodsd lodsw loop loope loopne
loopnz loopz lsl lss ltr
maskmovdqu maskmovq maxpd maxps maxsd
maxss mfence minpd minps minsd
minss mov movapd movaps movd
movdq2q movdqa movdqu movhlps movhpd
movhps movlhps movlpd movlps movmskpd
movmskps movntdq movnti movntpd movntps
movntq movq movq2dq movs movsb
movsd movss movsw movsx movupd
movups movzx mul mulpd mulps
mulsd mulss neg nop not
or orpd orps out outs
outsb outsd outsw packssdw packsswb
packuswb paddb paddd paddq paddsb
paddsw paddusb paddusw paddw pand
pandn pavgb pavgw pcmpeqb pcmpeqd
pcmpeqw pcmpgtb pcmpgtd pcmpgtw pextrw
pinsrw pmaddwd pmaxsw pmaxub pminsw
pminub pmovmskb pmulhuw pmulhw pmullw
pmuludq pop popa popad popf
popfd por prefetchnta prefetcht0 prefetcht1
prefetcht2 psadbw pshufd pshufhw pshuflw
pshufw pslld pslldq psllq psllw
psrad psraw psrld psrldq psrlq
psrlw psubb psubd psubq psubsb
psubsw psubusb psubusw psubw punpckhbw
punpckhdq punpckhqdq punpckhwd punpcklbw punpckldq
punpcklqdq punpcklwd push pusha pushad
pushf pushfd pxor rcl rcpps
rcpss rcr rdmsr rdpmc rdtsc
rep repe repne repnz repz
ret retf rol ror rsm
rsqrtps rsqrtss sahf sal sar
sbb scas scasb scasd scasw
seta setae setb setbe setc
sete setg setge setl setle
setna setnae setnb setnbe setnc
setne setng setnge setnl setnle
setno setnp setns setnz seto
setp setpe setpo sets setz
sfence sgdt shl shld shr
shrd shufpd shufps sidt sldt
smsw sqrtpd sqrtps sqrtsd sqrtss
stc std sti stmxcsr stos
stosb stosd stosw str sub
subpd subps subsd subss sysenter
sysexit test ucomisd ucomiss ud2
unpckhpd unpckhps unpcklpd unpcklps verr
verw wait wbinvd wrmsr xadd
xchg xlat xlatb xor xorpd
xorps

Pentium 4 (Prescott) Opcodes Supported

Pentium 4 Opcodes
addsubpdaddsubpsfisttphaddpdhaddps
hsubpdhsubpslddqumonitormovddup
movshdupmovsldupmwait

AMD Opcodes Supported

AMD Opcodes
pavgusbpf2idpfaccpfaddpfcmpeq
pfcmpgepfcmpgtpfmaxpfminpfmul
pfnaccpfpnaccpfrcppfrcpit1pfrcpit2
pfrsqit1pfrsqrtpfsubpfsubrpi2fd
pmulhrwpswapd

SIMD

SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AVX are supported.

Forums | Comments | Search | Downloads | Home