Assembly Expressions
Go Up to Inline Assembly Code Index
The built-in assembler evaluates all expressions as 32-bit integer values. It doesn't support floating-point and string values, except string constants.
Expressions are built from expression elements and operators, and each expression has an associated expression class and expression type.
Contents
Differences between Delphi and Assembler Expressions
The most important difference between Delphi expressions and built-in assembler expressions is that assembler expressions must resolve to a constant value. In other words, it must resolve to a value that can be computed at compile time. For example, given the declarations:
const
X = 10;
Y = 20;
var
Z: Integer;
the following is a valid statement:
asm
MOV Z,X+Y
end;
Because both X and Y are constants, the expression X + Y is a convenient way of writing the constant 30, and the resulting instruction simply moves of the value 30 into the variable Z. But if X and Y are variables:
var
X, Y: Integer;
the built-in assembler cannot compute the value of X + Y at compile time. In this case, to move the sum of X and Y into Z you would use:
asm
MOV EAX,X
ADD EAX,Y
MOV Z,EAX
end;
In a Delphi expression, a variable reference denotes the contents of the variable. But in an assembler expression, a variable reference denotes the address of the variable. In Delphi the expression X + 4 (where X is a variable) means the contents of X plus 4, while to the built-in assembler it means the contents of the word at the address four bytes higher than the address of X. So, even though you are allowed to write:
asm
MOV EAX,X+4
end;
this code doesn't load the value of X plus 4 into AX; instead, it loads the value of a word stored four bytes beyond X. The correct way to add 4 to the contents of X is:
asm
MOV EAX,X
ADD EAX,4
end;
Expression Elements
The elements of an expression are constants, registers, and symbols.
Numeric Constants
Numeric constants must be integers, and their values must be between 2,147,483,648 and 4,294,967,295.
By default, numeric constants use decimal notation, but the built-in assembler also supports binary, octal, and hexadecimal. Binary notation is selected by writing a B after the number, octal notation by writing an O after the number, and hexadecimal notation by writing an H after the number or a $ before the number.
Numeric constants must start with one of the digits 0 through 9 or the $ character. When you write a hexadecimal constant using the H suffix, an extra zero is required in front of the number if the first significant digit is one of the digits A through F. For example, 0BAD4H and $BAD4 are hexadecimal constants, but BAD4H is an identifier because it starts with a letter.
String Constants
String constants must be enclosed in single or double quotation marks. Two consecutive quotation marks of the same type as the enclosing quotation marks count as only one character. Here are some examples of string constants:
'Z'
'Delphi'
'Windows'
"That's all folks"
'"That''s all folks," he said.'
'100'
'"'
"'"
String constants of any length are allowed in DB directives, and cause allocation of a sequence of bytes containing the ASCII values of the characters in the string. In all other cases, a string constant can be no longer than four characters and denotes a numeric value which can participate in an expression. The numeric value of a string constant is calculated as:
Ord(Ch1) + Ord(Ch2) shl 8 + Ord(Ch3) shl 16 + Ord(Ch4) shl 24
where Ch1 is the rightmost (last) character and Ch4 is the leftmost (first) character. If the string is shorter than four characters, the leftmost characters are assumed to be zero. The following table shows string constants and their numeric values.
String examples and their values:
String | Value |
---|---|
'a' |
00000061H |
'ba' |
00006261H |
'cba' |
00636261H |
'dcba' |
64636261H |
'a ' |
00006120H |
' a' |
20202061H |
'a' * 2 |
000000E2H |
'a'-'A' |
00000020H |
not 'a' |
FFFFFF9EH |
Registers
The following reserved symbols denote CPU registers in the inline assembler:
- CPU registers
Category |
Identifiers |
---|---|
8-bit CPU registers |
AH, AL, BH, BL, CH, CL, DH, DL (general purpose registers); |
16-bit CPU registers |
AX, BX, CX, DX (general purpose registers); DI, SI, SP, BP (index registers); CS, DS, SS, ES (segment registers); IP (instruction pointer) |
32-bit CPU registers |
EAX, EBX, ECX, EDX (general purpose registers); EDI, ESI, ESP, EBP (index registers); FS, GS (segment registers); EIP |
FPU |
ST(0), ..., ST(7) |
MMX FPU registers |
mm0, ..., mm7 |
XMM registers |
xmm0, ..., xmm7 (..., xmm15 on x64) |
Intel 64 registers |
RAX, RBX, ... |
- x64 CPU General purpose registers, x86 FPU data registers, and x64 SSE data registers
When an operand consists solely of a register name, it is called a register operand. All registers can be used as register operands, and some registers can be used in other contexts.
The base registers (BX and BP) and the index registers (SI and DI) can be written within square brackets to indicate indexing. Valid base/index register combinations are [BX], [BP], [SI], [DI], [BX+SI], [BX+DI], [BP+SI], and [BP+DI]. You can also index with all the 32-bit registersfor example, [EAX+ECX], [ESP], and [ESP+EAX+5].
The segment registers (ES, CS, SS, DS, FS, and GS) are supported, but segments are normally not useful in 32-bit applications.
The symbol ST denotes the topmost register on the 8087 floating-point register stack. Each of the eight floating-point registers can be referred to using ST(X), where X is a constant between 0 and 7 indicating the distance from the top of the register stack.
Symbols
The built-in assembler allows you to access almost all Delphi identifiers in assembly language expressions, including constants, types, variables, procedures, and functions. In addition, the built-in assembler implements the special symbol @Result, which corresponds to the Result variable within the body of a function. For example, the function:
function Sum(X, Y: Integer): Integer;
begin
Result := X + Y;
end;
could be written in assembly language as:
function Sum(X, Y: Integer): Integer; stdcall;
begin
asm
MOV EAX,X
ADD EAX,Y
MOV @Result,EAX
end;
end;
The following symbols cannot be used in asm statements:
- Standard procedures and functions (for example, Writeln and Chr).
- String, floating-point, and set constants (except when loading registers).
- Labels that aren't declared in the current block.
- The @Result symbol outside of functions.
The following table summarizes the kinds of symbol that can be used in asm statements.
Symbols recognized by the built-in assembler:
Symbol | Value | Class | Type |
---|---|---|---|
Label |
Address of label |
Memory reference |
Size of type |
Constant |
Value of constant |
Immediate value |
0 |
Type |
0 |
Memory reference |
Size of type |
Field |
Offset of field |
Memory |
Size of type |
Variable |
Address of variable or address of a pointer to the variable |
Memory reference |
Size of type |
Procedure |
Address of procedure |
Memory reference |
Size of type |
Function |
Address of function |
Memory reference |
Size of type |
Unit |
0 |
Immediate value |
0 |
@Result |
Result variable offset |
Memory reference |
Size of type |
With optimizations disabled, local variables (variables declared in procedures and functions) are always allocated on the stack and accessed relative to EBP, and the value of a local variable symbol is its signed offset from EBP. The assembler automatically adds [EBP] in references to local variables. For example, given the declaration:
var Count: Integer;
within a function or procedure, the instruction:
MOV EAX,Count
assembles into MOV EAX,[EBP4].
The built-in assembler treats var parameters as a 32-bit pointers, and the size of a var parameter is always 4. The syntax for accessing a var parameter is different from that for accessing a value parameter. To access the contents of a var parameter, you must first load the 32-bit pointer and then access the location it points to. For example:
function Sum(var X, Y: Integer): Integer; stdcall;
begin
asm
MOV EAX,X
MOV EAX,[EAX]
MOV EDX,Y
ADD EAX,[EDX]
MOV @Result,EAX
end;
end;
Identifiers can be qualified within asm statements. For example, given the declarations:
type
TPoint = record
X, Y: Integer;
end;
TRect = record
A, B: TPoint;
end;
var
P: TPoint;
R: TRect;
the following constructions can be used in an asm statement to access fields:
MOV EAX,P.X
MOV EDX,P.Y
MOV ECX,R.A.X
MOV EBX,R.B.Y
A type identifier can be used to construct variables on the fly. Each of the following instructions generates the same machine code, which loads the contents of [EDX] into EAX.
MOV EAX,(TRect PTR [EDX]).B.X
MOV EAX,TRect([EDX]).B.X
MOV EAX,TRect[EDX].B.X
MOV EAX,[EDX].TRect.B.X
Expression Classes
The built-in assembler divides expressions into three classes: registers, memory references, and immediate values.
An expression that consists solely of a register name is a register expression. Examples of register expressions are AX, CL, DI, and ES. Used as operands, register expressions direct the assembler to generate instructions that operate on the CPU registers.
Expressions that denote memory locations are memory references. Delphi's labels, variables, typed constants, procedures, and functions belong to this category.
Expressions that aren't registers and aren't associated with memory locations are immediate values. This group includes Delphi's untyped constants and type identifiers.
Immediate values and memory references cause different code to be generated when used as operands. For example:
const
Start = 10;
var
Count: Integer;
// …
asm
MOV EAX,Start { MOV EAX,xxxx }
MOV EBX,Count { MOV EBX,[xxxx] }
MOV ECX,[Start] { MOV ECX,[xxxx] }
MOV EDX,OFFSET Count { MOV EDX,xxxx }
end;
Because Start is an immediate value, the first MOV is assembled into a move immediate instruction. The second MOV, however, is translated into a move memory instruction, as Count is a memory reference. In the third MOV, the brackets convert Start into a memory reference (in this case, the word at offset 10 in the data segment). In the fourth MOV, the OFFSET operator converts Count into an immediate value (the offset of Count in the data segment).
The brackets and OFFSET operator complement each other. The following asm statement produces identical machine code to the first two lines of the previous asm statement:
asm
MOV EAX,OFFSET [Start]
MOV EBX,[OFFSET Count]
end;
Memory references and immediate values are further classified as either relocatable or absolute. Relocation is the process by which the linker assigns absolute addresses to symbols. A relocatable expression denotes a value that requires relocation at link time, while an absolute expression denotes a value that requires no such relocation. Typically, expressions that refer to labels, variables, procedures, or functions are relocatable, since the final address of these symbols is unknown at compile time. Expressions that operate solely on constants are absolute.
The built-in assembler allows you to carry out any operation on an absolute value, but it restricts operations on relocatable values to addition and subtraction of constants.
Expression Types
Every built-in assembler expression has a type, or more correctly a size, because the assembler regards the type of an expression simply as the size of its memory location. For example, the type of an Integer variable is four, because it occupies 4 bytes. The built-in assembler performs type checking whenever possible, so in the instructions:
var
QuitFlag: Boolean;
OutBufPtr: Word;
// …
asm
MOV AL,QuitFlag
MOV BX,OutBufPtr
end;
the assembler checks that the size of QuitFlag is one (a byte), and that the size of OutBufPtr is two (a word). The instruction:
MOV DL,OutBufPtr
produces an error because DL is a byte-sized register and OutBufPtr is a word. The type of a memory reference can be changed through a typecast; these are correct ways of writing the previous instruction:
MOV DL,BYTE PTR OutBufPtr
MOV DL,Byte(OutBufPtr)
MOV DL,OutBufPtr.Byte
These MOV instructions all refer to the first (least significant) byte of the OutBufPtr variable.
In some cases, a memory reference is untyped. One example is an immediate value (Buffer) enclosed in square brackets:
procedure Example(var Buffer);
asm
MOV AL, [Buffer]
MOV CX, [Buffer]
MOV EDX, [Buffer]
end;
The built-in assembler permits these instructions, because the expression [Buffer] has no type. [Buffer] means "the contents of the location indicated by Buffer," and the type can be determined from the first operand (byte for AL, word for CX, and double-word for EDX).
In cases where the type can't be determined from another operand, the built-in assembler requires an explicit typecast. For example:
INC BYTE PTR [ECX]
IMUL WORD PTR [EDX]
The following table summarizes the predefined type symbols that the built-in assembler provides in addition to any currently declared Delphi types.
Predefined type symbols:
Symbol | Type |
---|---|
BYTE |
1 |
WORD |
2 |
DWORD |
4 |
QWORD |
8 |
TBYTE |
10 |
Expression Operators
The built-in assembler provides a variety of operators. Precedence rules are different from that of the Delphi language; for example, in an asm statement, AND has lower precedence than the addition and subtraction operators. The following table lists the built-in assembler's expression operators in decreasing order of precedence.
Precedence of built-in assembler expression operators
Operators | Remarks | Precedence |
---|---|---|
& |
highest | |
(... ), [... ],., HIGH, LOW |
||
+, - |
unary + and - |
|
: |
||
OFFSET, TYPE, PTR, *, /, MOD, SHL, SHR, +, - |
binary + and - |
|
NOT, AND, OR, XOR |
lowest |
The following table defines the built-in assembler's expression operators:
Definitions of built-in assembler expression operators:
Operator | Description |
---|---|
& |
Identifier override. The identifier immediately following the ampersand is treated as a user-defined symbol, even if the spelling is the same as a built-in assembler reserved symbol. |
(... ) |
Subexpression. Expressions within parentheses are evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the parentheses; the result in this case is the sum of the values of the two expressions, with the type of the first expression. |
[... ] |
Memory reference. The expression within brackets is evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the brackets; the result in this case is the sum of the values of the two expressions, with the type of the first expression. The result is always a memory reference. |
. |
Structure member selector. The result is the sum of the expression before the period and the expression after the period, with the type of the expression after the period. Symbols belonging to the scope identified by the expression before the period can be accessed in the expression after the period. |
HIGH |
Returns the high-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value. |
LOW |
Returns the low-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value. |
+ |
Unary plus. Returns the expression following the plus with no changes. The expression must be an absolute immediate value. |
- |
Unary minus. Returns the negated value of the expression following the minus. The expression must be an absolute immediate value. |
+ |
Addition. The expressions can be immediate values or memory references, but only one of the expressions can be a relocatable value. If one of the expressions is a relocatable value, the result is also a relocatable value. If either of the expressions is a memory reference, the result is also a memory reference. |
- |
Subtraction. The first expression can have any class, but the second expression must be an absolute immediate value. The result has the same class as the first expression. |
: |
Segment override. Instructs the assembler that the expression after the colon belongs to the segment given by the segment register name (CS, DS, SS, FS, GS, or ES) before the colon. The result is a memory reference with the value of the expression after the colon. When a segment override is used in an instruction operand, the instruction is prefixed with an appropriate segment-override prefix instruction to ensure that the indicated segment is selected. |
OFFSET |
Returns the offset part (double word) of the expression following the operator. The result is an immediate value. |
TYPE |
Returns the type (size in bytes) of the expression following the operator. The type of an immediate value is 0. |
PTR |
Typecast operator. The result is a memory reference with the value of the expression following the operator and the type of the expression in front of the operator. |
* |
Multiplication. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
/ |
Integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
MOD |
Remainder after integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
SHL |
Logical shift left. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
SHR |
Logical shift right. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
NOT |
Bitwise negation. The expression must be an absolute immediate value, and the result is an absolute immediate value. |
AND |
Bitwise AND. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
OR |
Bitwise OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |
XOR |
Bitwise exclusive OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value. |