Assembly Expressions

From RAD Studio
Jump to: navigation, search

Go Up to Inline Assembly Code Index

The built-in assembler evaluates all expressions as 32-bit integer values. It doesn't support floating-point and string values, except string constants.

Expressions are built from expression elements and operators, and each expression has an associated expression class and expression type.

Differences between Delphi and Assembler Expressions

The most important difference between Delphi expressions and built-in assembler expressions is that assembler expressions must resolve to a constant value. In other words, it must resolve to a value that can be computed at compile time. For example, given the declarations:

const
  X = 10;
  Y = 20;
var
  Z: Integer;

the following is a valid statement:

asm
  MOV      Z,X+Y
end;

Because both X and Y are constants, the expression X + Y is a convenient way of writing the constant 30, and the resulting instruction simply moves of the value 30 into the variable Z. But if X and Y are variables:

var
  X, Y: Integer;

the built-in assembler cannot compute the value of X + Y at compile time. In this case, to move the sum of X and Y into Z you would use:

asm
  MOV          EAX,X
  ADD          EAX,Y
  MOV          Z,EAX
end;

In a Delphi expression, a variable reference denotes the contents of the variable. But in an assembler expression, a variable reference denotes the address of the variable. In Delphi the expression X + 4 (where X is a variable) means the contents of X plus 4, while to the built-in assembler it means the contents of the word at the address four bytes higher than the address of X. So, even though you are allowed to write:

asm
  MOV          EAX,X+4
end;

this code doesn't load the value of X plus 4 into AX; instead, it loads the value of a word stored four bytes beyond X. The correct way to add 4 to the contents of X is:

asm
  MOV          EAX,X
  ADD          EAX,4
end;

Expression Elements

The elements of an expression are constants, registers, and symbols.

Numeric Constants

Numeric constants must be integers, and their values must be between 2,147,483,648 and 4,294,967,295.

By default, numeric constants use decimal notation, but the built-in assembler also supports binary, octal, and hexadecimal. Binary notation is selected by writing a B after the number, octal notation by writing an O after the number, and hexadecimal notation by writing an H after the number or a $ before the number.

Numeric constants must start with one of the digits 0 through 9 or the $ character. When you write a hexadecimal constant using the H suffix, an extra zero is required in front of the number if the first significant digit is one of the digits A through F. For example, 0BAD4H and $BAD4 are hexadecimal constants, but BAD4H is an identifier because it starts with a letter.

String Constants

String constants must be enclosed in single or double quotation marks. Two consecutive quotation marks of the same type as the enclosing quotation marks count as only one character. Here are some examples of string constants:

'Z'
'Delphi'
'Windows'
"That's all folks"
'"That''s all folks," he said.'
'100'
'"'
"'"

String constants of any length are allowed in DB directives, and cause allocation of a sequence of bytes containing the ASCII values of the characters in the string. In all other cases, a string constant can be no longer than four characters and denotes a numeric value which can participate in an expression. The numeric value of a string constant is calculated as:

Ord(Ch1) + Ord(Ch2) shl 8 + Ord(Ch3) shl 16 + Ord(Ch4) shl 24

where Ch1 is the rightmost (last) character and Ch4 is the leftmost (first) character. If the string is shorter than four characters, the leftmost characters are assumed to be zero. The following table shows string constants and their numeric values.

String examples and their values:

String Value

'a'

00000061H

'ba'

00006261H

'cba'

00636261H

'dcba'

64636261H

'a '

00006120H

' a'

20202061H

'a' * 2

000000E2H

'a'-'A'

00000020H

not 'a'

FFFFFF9EH

Registers

The following reserved symbols denote CPU registers in the inline assembler:

CPU registers

Category

Identifiers

8-bit CPU registers

AH, AL, BH, BL, CH, CL, DH, DL (general purpose registers);

16-bit CPU registers

AX, BX, CX, DX (general purpose registers); DI, SI, SP, BP (index registers); CS, DS, SS, ES (segment registers); IP (instruction pointer)

32-bit CPU registers

EAX, EBX, ECX, EDX (general purpose registers); EDI, ESI, ESP, EBP (index registers); FS, GS (segment registers); EIP

FPU

ST(0), ..., ST(7)

MMX FPU registers

mm0, ..., mm7

XMM registers

xmm0, ..., xmm7 (..., xmm15 on x64)

Intel 64 registers

RAX, RBX, ...


x64 CPU General purpose registers, x86 FPU data registers, and x64 SSE data registers

X64 GPR.pngX86 FPU.pngX64 SSE.png

When an operand consists solely of a register name, it is called a register operand. All registers can be used as register operands, and some registers can be used in other contexts.

The base registers (BX and BP) and the index registers (SI and DI) can be written within square brackets to indicate indexing. Valid base/index register combinations are [BX], [BP], [SI], [DI], [BX+SI], [BX+DI], [BP+SI], and [BP+DI]. You can also index with all the 32-bit registersfor example, [EAX+ECX], [ESP], and [ESP+EAX+5].

The segment registers (ES, CS, SS, DS, FS, and GS) are supported, but segments are normally not useful in 32-bit applications.

The symbol ST denotes the topmost register on the 8087 floating-point register stack. Each of the eight floating-point registers can be referred to using ST(X), where X is a constant between 0 and 7 indicating the distance from the top of the register stack.

Symbols

The built-in assembler allows you to access almost all Delphi identifiers in assembly language expressions, including constants, types, variables, procedures, and functions. In addition, the built-in assembler implements the special symbol @Result, which corresponds to the Result variable within the body of a function. For example, the function:

function Sum(X, Y: Integer): Integer;
begin
  Result := X + Y;
end;

could be written in assembly language as:

function Sum(X, Y: Integer): Integer; stdcall;
begin
  asm
    MOV        EAX,X
    ADD        EAX,Y
    MOV        @Result,EAX
  end;
end;

The following symbols cannot be used in asm statements:

  • Standard procedures and functions (for example, Writeln and Chr).
  • String, floating-point, and set constants (except when loading registers).
  • Labels that aren't declared in the current block.
  • The @Result symbol outside of functions.

The following table summarizes the kinds of symbol that can be used in asm statements.

Symbols recognized by the built-in assembler:

Symbol Value Class Type

Label

Address of label

Memory reference  

Size of type

Constant

Value of constant

Immediate value

0

Type

0

Memory reference

Size of type

Field

Offset of field

Memory

Size of type

Variable

Address of variable or address of a pointer to the variable

Memory reference

Size of type

Procedure

Address of procedure

Memory reference

Size of type

Function

Address of function

Memory reference

Size of type

Unit

0

Immediate value

0

@Result

Result variable offset

Memory reference

Size of type

With optimizations disabled, local variables (variables declared in procedures and functions) are always allocated on the stack and accessed relative to EBP, and the value of a local variable symbol is its signed offset from EBP. The assembler automatically adds [EBP] in references to local variables. For example, given the declaration:

var Count: Integer;

within a function or procedure, the instruction:

MOV    EAX,Count

assembles into MOV EAX,[EBP4].

The built-in assembler treats var parameters as a 32-bit pointers, and the size of a var parameter is always 4. The syntax for accessing a var parameter is different from that for accessing a value parameter. To access the contents of a var parameter, you must first load the 32-bit pointer and then access the location it points to. For example:

function Sum(var X, Y: Integer): Integer; stdcall;
begin
  asm
    MOV                EAX,X
    MOV                EAX,[EAX]
    MOV                EDX,Y
    ADD                EAX,[EDX]
    MOV                @Result,EAX
  end;
end;

Identifiers can be qualified within asm statements. For example, given the declarations:

type
  TPoint = record
    X, Y: Integer;
  end;
  TRect = record
    A, B: TPoint;
  end;
var
  P: TPoint;
  R: TRect;

the following constructions can be used in an asm statement to access fields:

MOV    EAX,P.X
MOV    EDX,P.Y
MOV    ECX,R.A.X
MOV    EBX,R.B.Y

A type identifier can be used to construct variables on the fly. Each of the following instructions generates the same machine code, which loads the contents of [EDX] into EAX.

MOV    EAX,(TRect PTR [EDX]).B.X
MOV    EAX,TRect([EDX]).B.X
MOV    EAX,TRect[EDX].B.X
MOV    EAX,[EDX].TRect.B.X

Expression Classes

The built-in assembler divides expressions into three classes: registers, memory references, and immediate values.

An expression that consists solely of a register name is a register expression. Examples of register expressions are AX, CL, DI, and ES. Used as operands, register expressions direct the assembler to generate instructions that operate on the CPU registers.

Expressions that denote memory locations are memory references. Delphi's labels, variables, typed constants, procedures, and functions belong to this category.

Expressions that aren't registers and aren't associated with memory locations are immediate values. This group includes Delphi's untyped constants and type identifiers.

Immediate values and memory references cause different code to be generated when used as operands. For example:

const
  Start = 10;
var
  Count: Integer;
// …
asm
  MOV  EAX,Start       { MOV EAX,xxxx }
  MOV  EBX,Count       { MOV EBX,[xxxx] }
  MOV  ECX,[Start]     { MOV ECX,[xxxx] }
  MOV  EDX,OFFSET Count    { MOV EDX,xxxx }
end;

Because Start is an immediate value, the first MOV is assembled into a move immediate instruction. The second MOV, however, is translated into a move memory instruction, as Count is a memory reference. In the third MOV, the brackets convert Start into a memory reference (in this case, the word at offset 10 in the data segment). In the fourth MOV, the OFFSET operator converts Count into an immediate value (the offset of Count in the data segment).

The brackets and OFFSET operator complement each other. The following asm statement produces identical machine code to the first two lines of the previous asm statement:

asm
  MOV      EAX,OFFSET [Start]
  MOV      EBX,[OFFSET Count]
end;

Memory references and immediate values are further classified as either relocatable or absolute. Relocation is the process by which the linker assigns absolute addresses to symbols. A relocatable expression denotes a value that requires relocation at link time, while an absolute expression denotes a value that requires no such relocation. Typically, expressions that refer to labels, variables, procedures, or functions are relocatable, since the final address of these symbols is unknown at compile time. Expressions that operate solely on constants are absolute.

The built-in assembler allows you to carry out any operation on an absolute value, but it restricts operations on relocatable values to addition and subtraction of constants.

Expression Types

Every built-in assembler expression has a type, or more correctly a size, because the assembler regards the type of an expression simply as the size of its memory location. For example, the type of an Integer variable is four, because it occupies 4 bytes. The built-in assembler performs type checking whenever possible, so in the instructions:

var
  QuitFlag: Boolean;
  OutBufPtr: Word;
// …
asm
  MOV      AL,QuitFlag
  MOV      BX,OutBufPtr
end;

the assembler checks that the size of QuitFlag is one (a byte), and that the size of OutBufPtr is two (a word). The instruction:

MOV        DL,OutBufPtr

produces an error because DL is a byte-sized register and OutBufPtr is a word. The type of a memory reference can be changed through a typecast; these are correct ways of writing the previous instruction:

MOV        DL,BYTE PTR OutBufPtr
MOV        DL,Byte(OutBufPtr)
MOV        DL,OutBufPtr.Byte

These MOV instructions all refer to the first (least significant) byte of the OutBufPtr variable.

In some cases, a memory reference is untyped. One example is an immediate value (Buffer) enclosed in square brackets:

procedure Example(var Buffer);
asm
   MOV AL,     [Buffer]
   MOV CX,     [Buffer]
   MOV EDX,   [Buffer]
end;

The built-in assembler permits these instructions, because the expression [Buffer] has no type. [Buffer] means "the contents of the location indicated by Buffer," and the type can be determined from the first operand (byte for AL, word for CX, and double-word for EDX).

In cases where the type can't be determined from another operand, the built-in assembler requires an explicit typecast. For example:

INC     BYTE PTR [ECX]
IMUL    WORD PTR [EDX]

The following table summarizes the predefined type symbols that the built-in assembler provides in addition to any currently declared Delphi types.

Predefined type symbols:

Symbol    Type   

BYTE

1

WORD

2

DWORD

4

QWORD

8

TBYTE

10

Expression Operators

The built-in assembler provides a variety of operators. Precedence rules are different from that of the Delphi language; for example, in an asm statement, AND has lower precedence than the addition and subtraction operators. The following table lists the built-in assembler's expression operators in decreasing order of precedence.

Precedence of built-in assembler expression operators

Operators Remarks Precedence

&

highest

(... ), [... ],., HIGH, LOW

+, -

unary + and -

:

OFFSET, TYPE, PTR, *, /, MOD, SHL, SHR, +, -   

binary + and -

NOT, AND, OR, XOR

lowest


The following table defines the built-in assembler's expression operators:

Definitions of built-in assembler expression operators:

Operator Description

&

Identifier override. The identifier immediately following the ampersand is treated as a user-defined symbol, even if the spelling is the same as a built-in assembler reserved symbol.

(... )

Subexpression. Expressions within parentheses are evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the parentheses; the result in this case is the sum of the values of the two expressions, with the type of the first expression.

[... ]

Memory reference. The expression within brackets is evaluated completely prior to being treated as a single expression element. Another expression can precede the expression within the brackets; the result in this case is the sum of the values of the two expressions, with the type of the first expression. The result is always a memory reference.

.

Structure member selector. The result is the sum of the expression before the period and the expression after the period, with the type of the expression after the period. Symbols belonging to the scope identified by the expression before the period can be accessed in the expression after the period.

HIGH

Returns the high-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value.

LOW

Returns the low-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value.

+

Unary plus. Returns the expression following the plus with no changes. The expression must be an absolute immediate value.

-

Unary minus. Returns the negated value of the expression following the minus. The expression must be an absolute immediate value.

+

Addition. The expressions can be immediate values or memory references, but only one of the expressions can be a relocatable value. If one of the expressions is a relocatable value, the result is also a relocatable value. If either of the expressions is a memory reference, the result is also a memory reference.

-

Subtraction. The first expression can have any class, but the second expression must be an absolute immediate value. The result has the same class as the first expression.

:

Segment override. Instructs the assembler that the expression after the colon belongs to the segment given by the segment register name (CS, DS, SS, FS, GS, or ES) before the colon. The result is a memory reference with the value of the expression after the colon. When a segment override is used in an instruction operand, the instruction is prefixed with an appropriate segment-override prefix instruction to ensure that the indicated segment is selected.

OFFSET

Returns the offset part (double word) of the expression following the operator. The result is an immediate value.

TYPE

Returns the type (size in bytes) of the expression following the operator. The type of an immediate value is 0.

PTR

Typecast operator. The result is a memory reference with the value of the expression following the operator and the type of the expression in front of the operator.

*

Multiplication. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

/

Integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

MOD

Remainder after integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

SHL

Logical shift left. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

SHR

Logical shift right. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

NOT

Bitwise negation. The expression must be an absolute immediate value, and the result is an absolute immediate value.

AND

Bitwise AND. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

OR

Bitwise OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

XOR

Bitwise exclusive OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

See Also