Fundamental Syntactic Elements (Delphi)

From RAD Studio
Jump to: navigation, search

Go Up to Fundamental Syntactic Elements Index


Fundamental syntactic elements, called tokens, combine to form expressions, declarations, and statements. A statement describes an algorithmic action that can be executed within a program. An expression is a syntactic unit that occurs within a statement and denotes a value. A declaration defines an identifier (such as the name of a function or variable) that can be used in expressions and statements, and, where appropriate, allocates memory for the identifier.

This topic introduces the Delphi language character set, and describes the syntax for declaring:

  • Identifiers
  • Numbers
  • Character strings
  • Labels
  • Source code comments

The Delphi Character Set

The Delphi language uses the Unicode character encoding for its character set, including alphabetic and alphanumeric Unicode characters and the underscore. Delphi is not case-sensitive. The space character and control characters (U+0000 through U+001F including U+000D, the return or end-of-line character) are blanks.

The RAD Studio compiler will accept a file encoded in UCS-2 or UCS-4 if the file contains a byte order mark. The speed of compilation may be penalized by the use for formats other than UTF-8, however. All characters in a UCS-4 encoded source file must be representable in UCS-2 without surrogate pairs. UCS-2 encodings with surrogate pairs (including GB18030) are accepted only if the codepage compiler option is specified.

Tokens

On the simplest level, a program is a sequence of tokens delimited by separators. A token is the smallest meaningful unit of text in a program. A separator is either a blank or a comment. Strictly speaking, it is not always necessary to place a separator between two tokens; for example, the code fragment:

 Size:=20;Price:=10;

Is perfectly legal. Convention and readability, however, dictate that we write this in two lines, as:

  Size := 20;
  Price := 10;

Tokens are categorized as special symbols, identifiers, reserved words, directives, numerals, labels, and character strings. A separator can be part of a token only if the token is a character string. Adjacent identifiers, reserved words, numerals, and labels must have one or more separators between them.

Special Symbols

Special symbols are non-alphanumeric characters, or pairs of such characters, that have fixed meanings. The following single characters are special symbols:

# $ & ' ( ) * + , - . / : ; < = > @ [ ] ^ { }

The following character pairs are also special symbols:

(* (. *) .) .. // := <= >= <>

The following table shows pairs of symbols used in Delphi that have similar meanings (the symbol pairs {} and (* *) are comment characters that are further described in Comments and Compiler Directives):

Special Symbols   Similar Special Symbols

[    ]

(.   .)

{    }

(*    *)


The left bracket [ is similar to the character pair of left parenthesis and period (..

The right bracket ] is similar to the character pair of period and right parenthesis .).

The left brace { is similar to the character pair of left parenthesis and asterisk (*.

The right brace } is similar to the character pair of asterisk and right parenthesis *).

Note: %, ?, \, !, " (double quotation marks), _ (underscore), | (pipe), and ~ (tilde) are not special symbols.

Identifiers

Identifiers denote constants, variables, fields, types, properties, procedures, functions, programs, units, libraries, and packages. An identifier can be any length, but only the first 255 characters are significant. An identifier must begin with an alphabetic character, a Unicode character, or an underscore (_) and cannot contain spaces (or Unicode characters considered as whitespace). Alphanumeric characters, Unicode characters, digits (including Unicode characters representing numerals), and underscores are allowed after the first character. Reserved words cannot be used as identifiers. Since the Delphi Language is case-insensitive, an identifier like CalculateValue could be written in any of these ways:

 CalculateValue
 calculateValue
 calculatevalue
 CALCULATEVALUE

Since unit names correspond to file names, inconsistencies in cases can sometimes affect compilation. For more information, see the section Unit References and the Uses Clause in Programs and Units (Delphi).

Qualified Identifiers

When you use an identifier that has been declared in more than one place, it is sometimes necessary to qualify the identifier. The syntax for a qualified identifier is:

identifier1.identifier2

Where identifier1 qualifies identifier2. For example, if two units each declare a variable called CurrentValue, you can specify that you want to access the CurrentValue in Unit2 by writing:

 Unit2.CurrentValue

Qualifiers can be iterated. For example:

 Form1.Button1.Click

calls the Click method in Button1 of Form1.

If you do not qualify an identifier, its interpretation is determined by the rules of the scope described in Blocks and scope inside Declarations and Statements (Delphi).

Extended Identifiers

You might encounter identifiers (e.g., types or methods in a class) having the same name as a Delphi language reserved word. For example, a class might have a method called begin. Delphi reserved words such as begin cannot be used for an identifier name.

If you fully qualify the identifier, then there is no problem. For example, if you want to use the Delphi reserved word type for an identifier name, you must use its fully qualified name:

 var TMyType.type
 // Using a fully qualified name avoids ambiguity with {{Delphi}} language keyword.

As a shorter alternative, the ampersand (&) operator can be used to resolve ambiguities between identifiers and Delphi language reserved words. The & prevents a keyword from being parsed as a keyword (that is, a reserved word). If you encounter a method or type with the same name as a Delphi keyword, you can omit the namespace specification if you prefix the identifier name with an ampersand. But when you are declaring an identifier that has the same name as a keyword, you must use the &:

 type
  &Type = Integer;
  // Prefix with '&' is ok.

Reserved Words

The following reserved words cannot be redefined or used as identifiers.

Delphi Reserved Words:

and

end

interface

record

var

array

except

is

repeat

while  

as

exports

label

resourcestring    

with        

asm

file

library3

set    

xor

begin

finalization   

mod

shl

case

finally

nil

shr

class

for

not

string

const

function

object

then


constructor

goto

of

threadvar


destructor

if

or

to


dispinterface     

implementation    

packed

try


div

in

procedure    

type


do

inherited

program      

unit


downto

initialization   

property

until


else

inline

raise

uses


Note: In addition to the words in the preceding table, private, protected, public, published, and automated act as reserved words within class type declarations, but are otherwise treated as directives. The words at and on also have special meanings, and should be treated as reserved words. The keywords of object are used to define method pointers.

Directives

Delphi has more than one type of directive. One meaning for "directive" is a word that is sensitive in specific locations within source code. This type of directive has special meaning in the Delphi language, but, unlike a reserved word, appears only in contexts where user-defined identifiers cannot occur. Hence -- although it is inadvisable to do so -- you can define an identifier that looks exactly like a directive.

Directives:

absolute

export12

name

public

stdcall

abstract

external

near1

published

strict

assembler12

far1

nodefault

read

stored

automated

final

operator10

readonly

unsafe

cdecl

forward

out

reference9

varargs

contains7

helper8

overload

register

virtual

default

implements

override

reintroduce

winapi6

delayed

index

package7

requires7

write

deprecated11

inline2

pascal


resident1

writeonly

dispid

library311

platform11

safecall


dynamic

local4

private

sealed5


experimental11

message

protected

static


Note:
  1. far, near, and resident are obsolete.
  2. inline is used directive-style at the end of procedure and function declaration to mark the procedure or function for inlining , but became a reserved word for Turbo Pascal.
  3. library is also a keyword when used as the first token in project source code; it indicates a DLL target. Otherwise, it marks a symbol so that it produces a library warning when used.
  4. local was a Kylix directive and is ignored for Delphi for Win32.
  5. sealed is a class directive with odd syntax: 'class sealed'. A sealed class cannot be extended or derived (like final in C++).
  6. winapi defines the default platform calling convention. For example, on Win32 winapi is the same as stdcall.
  7. package, when used as the first token, indicates a package target and enables package syntax. requires and contains are directives only in package syntax.
  8. helper indicates "class helper for".
  9. reference indicates a reference to a function or procedure.
  10. operator indicates class operator.
  11. platform, deprecated, experimental, and library are hinting (or warning) directives. These directives produce warnings at compile time.
  12. assembler and export directives have no meaning. They exist only for the backward compatibility.


Types of Directives

Delphi has two types of directives, including the context-sensitive type of directive listed in the Directives table above.

A context-sensitive directive can be an identifier -- not typically a keyword -- that you place at the end of a declaration to modify the meaning of the declaration. For example:

  procedure P; forward;

Or:

  procedure M; virtual; override;

Or:

  property Foo: Integer read FFoo write FFoo default 42;


The last type of directive is the official compiler directive, which is a switch or option that affects the behavior of the compiler. A compiler directive is surrounded by braces ({}) and begins with a dollar-sign ($), like this:

  {$POINTERMATH ON}

  {$D+} // DEBUGINFO ON

Like the other types of directives, compiler directives are not keywords. For a list of the compiler directives, see the Delphi compiler directives list.

Numerals

Integer and real constants can be represented in decimal notation as sequences of digits without commas or spaces, and prefixed with the + or - operator to indicate sign. Values default to positive (so that, for example, 67258 is equivalent to +67258) and must be within the range of the largest predefined real or integer type.

Numerals with decimal points or exponents denote reals, while other numerals denote integers. When the character E or e occurs within a real, it means "times ten to the power of". For example, 7E2 means 7 * 10^2, and 12.25e+6 and 12.25e6 both mean 12.25 * 10^6.

The dollar-sign prefix indicates a hexadecimal numeral, for example, $8F. Hexadecimal numbers without a preceding - unary operator are taken to be positive values. During an assignment, if a hexadecimal value lies outside the range of the receiving type an error is raised, except in the case of the Integer (32-bit integer) where a warning is raised. In this case, values exceeding the positive range for Integer are taken to be negative numbers in a manner consistent with two's complement integer representation.

For more information about real and integer types, see About Data Types (Delphi). For information about the data types of numerals, see Declared Constants.

Labels

You can use either an identifier or a non-negative integer number as a label. The Delphi compiler allows numeric labels from 0 to 4294967295 (uint32 range).

Labels are used in goto statements. For more information about goto statements and labels, see Goto Statements in Declarations and Statements (Delphi).

Character Strings

A character string, also called a string literal or string constant, consists of a quoted string, a control string, or a combination of quoted and control strings. Separators can occur only within quoted strings.

A quoted string is a sequence of characters, from an ANSI or multibyte character set, written on one line and enclosed by apostrophes. A quoted string with nothing between the apostrophes is a null string. Two sequential apostrophes in a quoted string denote a single character, namely an apostrophe.

The string is represented internally as a Unicode string encoded as UTF-16. Characters in the Basic Multilingual Plane (BMP) take 2 bytes, and characters not in the BMP require 4 bytes.

For example:

   'Embarcadero'        { Embarcadero }
   'You''ll see'        { You'll see }
   'アプリケーションを Unicode 対応にする'
   ''''                 { ' }
   ''                   { null string }
   ' '                  { a space }

A control string is a sequence of one or more control characters, each of which consists of the # symbol followed by an unsigned integer constant from 0 to 65,535 (decimal) or from $0 to $FFFF (hexadecimal) in UTF-16 encoding, and denotes the character corresponding to a specified code value. Each integer is represented internally by 2 bytes in the string. This is useful for representing control characters and multibyte characters. The control string:

#89#111#117

Is equivalent to the quoted string:

'You'

You can combine quoted strings with control strings to form larger character strings. For example, you could use:

'Line 1'#13#10'Line 2'

To put a carriage-return line-feed between 'Line 1' and 'Line 2'. However, you cannot concatenate two quoted strings in this way, since a pair of sequential apostrophes is interpreted as a single character. (To concatenate quoted strings, use the + operator or simply combine them into a single quoted string.)

A character string is compatible with any string type and with the PChar type. Since an AnsiString type may contain multibyte characters, a character string with one character, single or multibyte, is compatible with any character type. When extended syntax is enabled (with compiler directive {$X+}), a nonempty character string of length n is compatible with zero-based arrays and packed arrays of n characters. For more information, see About Data Types (Delphi).

As of RAD Studio 12.0, String literals can now be longer than 255 characters; in other words, string literals are not limited to the classic Pascal ShortString type anymore. The language adds support for multiline strings. For extended information on multiline strings, go to our String Types (Delphi) page, under Long and Multiline String Literals.

Comments and Compiler Directives

Comments are ignored by the compiler, except when they function as separators (delimiting adjacent tokens) or compiler directives.

There are several ways to construct comments:

 { Text between left and right braces is a comment. }
 (* Text between left-parenthesis-plus-asterisk and an
  asterisk-plus-right-parenthesis is also a comment *)
 // Text between double-slash and end of line is a comment.

Comments that are alike cannot be nested. For instance, (*{}*) will. This latter form is useful for commenting out sections of code that also contain comments.

Here are some recommendations about how and when to use the three types of comment characters:

  • Use the double-slash (//) for commenting out temporary changes made during development. You can use the Code Editor convenient CTRL+/ (slash) mechanism to quickly insert the double-slash comment character while you are working.
  • Use the parenthesis-star "(*...*)" both for development comments and for commenting out a block of code that contains other comments. This comment character permits multiple lines of source, including other types of comments, to be removed from consideration by the compiler.
  • Use the braces ({}) for in-source documentation that you intend to remain with the code.

A comment that contains a dollar sign ($) immediately after the opening { or (* is a compiler directive. For example,

 {$WARNINGS OFF}

Tells the compiler not to generate warning messages.

See Also