Show: Delphi C++
Display Preferences

Internal Data Formats (Delphi)

From RAD Studio XE3
Jump to: navigation, search

Contents

Go Up to Memory Management Index


The following topics describe the internal formats of Delphi data types.

Integer Types

Integer values have the following internal representation in Delphi.

Platform-Independent Unsigned Integer Types

Values of platform-independent integer types occupy the same number of bits on any platform.

Values of unsigned integer types always are positive and do not involve a Sign bit as do signed integer types. All bits of unsigned integer types occupy by the magnitude of the value and have no other meaning.


Byte, UInt8

Byte and UInt8 are 1-byte (8-bit) unsigned positive integer numbers. The Magnitude occupies all 8-bits.

Integer Unsigned 8-bit


Word and UInt16

Word and UInt16 are 2-byte (16-bit) unsigned integer numbers.

Integer Unsigned 16-bit


Cardinal, LongWord, and UInt32

Cardinal, LongWord, and UInt32 are 4-byte (32-bit) unsigned integer numbers.

Integer Unsigned 32-bit


UInt64

UInt64 are 8-byte (64-bit) unsigned integer numbers.

Integer Unsigned 64-bit


Platform-Independent Signed Integer Types

Values of signed integer types not need to represent a number's sign by one leading sign bit, expressed by the most significant bit. The sign bit is 0 for a positive number, and 1 for a negative number. Other bits in a positive signed integer number are occupied by the magnitude. In a negative signed integer number, other bits are occupied by the 2's compliment representation of the value's magnitude (absolute value).

To obtain the 2's compliment to a magnitude:

  1. Starting from the right, find the first '1'.
  2. Invert all of the bits to the left of that one.

For example:

Example 1 Example 2
Magnitude 0101010 1010101

2's Compliment

1010110 01010110


ShortInt, Int8

Shortint and Int8 are 1-byte (8-bit) signed integer numbers. The sign bit' occupies the most significant 7-th bit, the Magnitude or 2's compliment occupies other 7 bits.

Integer Signed Positive 8-bit
Integer Signed Negative 8-bit


SmallInt and Int16

SmallInt and Int16 are 2-byte (16-bit) signed integer numbers.

Integer Signed Positive 16-bit
Integer Signed Negative 16-bit


Integer, LongInt, and Int32

Integer, LongInt, and Int32 are 4-byte (32-bit) signed integer numbers.

Integer Signed Positive 32-bit
Integer Signed Negative 32-bit


Int64

Int64 are 8-byte (64-bit) signed integer numbers.

Integer Signed Positive 64-bit
Integer Signed Negative 64-bit


Platform-Dependent Integer Types

The platform-dependent integer types are NativeUInt and NativeInt. The platform-dependent integer types are transformed to fit the bit size of the current target platform. On 64-bit platforms they occupy 64 bits, on 32-bit platforms they occupy 32 bits. When the size of the target platform is the same as the CPU platform, then one platform-dependent integer number exactly matches the size of CPU registers. These types are often used when best performance is desired for a particular CPU type and operating system.

Unsigned Integer NativeUInt

NativeUInt is the platform-dependent unsigned integer type. The size and internal representation of NativeUInt depends on the current platform. On 32-bit platforms, NativeUInt is equivalent to the Cardinal type. On 64-bit platforms, NativeUInt is equivalent to the UInt64 type.


Signed Integer NativeInt

NativeInt is the platform-dependent signed integer type. The size and internal representation of NativeInt depends on the current platform. On 32-bit platforms, NativeInt is equivalent to the Integer type. On 64-bit platforms, NativeInt is equivalent to the Int64 type.

Integer Subrange Types

When you use integer constants to define the minimum and maximum bounds of a subrange type, you define an integer subrange type. An integer subrange type represents a subset of the values in an integer type (called the base type). The base type is the smallest integer type that contains the specified range (contains both the minimum and maximum bounds).

The internal data format of an integer subrange type variable depends on its minimum and maximum bounds:

  • If both bounds are within the range -128..127 (Shortint), the variable is stored as a signed byte.
  • If both bounds are within the range 0..255 (Byte), the variable is stored as an unsigned byte.
  • If both bounds are within the range -32768..32767 (Smallint), the variable is stored as a signed word.
  • If both bounds are within the range 0..65535 (Word), the variable is stored as an unsigned word.
  • If both bounds are within the range -2147483648..2147483647 (Longint), the variable is stored as a signed double word.
  • If both bounds are within the range 0..4294967295 (Longword), the variable is stored as an unsigned double word.
  • Otherwise, the variable is stored as a signed quadruple word (Int64).
Note: A "word" occupies two bytes.

Character Types

On the Win32 platform:

  • Char and WideChar are stored as an unsigned word variable, normally using UTF-16 or Unicode encoding.
  • AnsiChar and subranges of a Char type are stored as an unsigned byte. In Delphi 2007 and earlier, Char was represented as an AnsiChar. The character type used with Short Strings is always AnsiChar and is stored in unsigned byte values.
  • The default long string type (string) is now UnicodeString, which is reference counted like an AnsiString, the former default long string type. Compatibility with older code may require the use of the AnsiString type.
  • WideString is composed of WideChars like UnicodeString, but is not reference counted.

Boolean Types

A Boolean type is stored as a Byte, a ByteBool is stored as a Byte, a WordBool type is stored as a Word, and a LongBool is stored as a Longint.

A Boolean can assume the values 0 (False) and 1 (True). ByteBool, WordBool, and LongBool types can assume the values 0 (False) or nonzero (True).

Enumerated Types

An enumerated type is stored as an unsigned byte if the enumeration has no more than 256 values and the type was declared in the {$Z1} state (the default). If an enumerated type has more than 256 values, or if the type was declared in the {$Z2} state, it is stored as an unsigned word. If an enumerated type is declared in the {$Z4} state, it is stored as an unsigned double-word.

Real Types

The real types store the binary representation of a sign (+ or -), an exponent, and a significand. A real value has the form

+/- significand * 2^exponent

where the significand has a single bit to the left of the binary decimal point (that is, 0 <= significand < 2).

In the images that follow, the most significant bit is always on the left, and the least significant bit, on the right. The numbers at the top indicate the width (in bits) of each field, with the leftmost items stored at the highest addresses. For example, for a Real48 value, e is stored in the first byte, f in the following five bytes, and s in the most significant bit of the last byte.

The Real48 type

On Win32 platforms, a 6-byte (48-bit) Real48 number is divided into three fields.

1

           39                                 

     8     

s

          f

    e


If 0 < e <= 255, the value v of the number is given by:

v = (-1)s * 2(e-129) * (1.f)

If e = 0, then v = 0.

The Real48 type cannot store denormals, NaNs, and infinities. Denormals become zero when stored in a Real48, while NaNs and infinities produce an overflow error if an attempt is made to store them in a Real48.

The Single type

A 4-byte (32-bit) Single number is divided into three fields.

1

     8     

           23           

s

     e

           f


The value v of the number is given by:

  • If 0 < e < 255, then v = (-1)s * 2(e-127) * (1.f)
  • If e = 0 and f <> 0, then v = (-1)s * 2(-126) * (0.f)
  • If e = 0 and f = 0, then v = (-1)s * 0
  • If e = 255 and f = 0, then v = (-1)s * Inf
  • If e = 255 and f <> 0, then v is a NaN

The Double type

The Real type, in the current implementation, is equivalent to Double.

An 8-byte (64-bit) Double number is divided into three fields.

1

      11      

                           52                           

s

      e

                           f


The value v of the number is given by:

  • If 0 < e < 2047, then v = (-1)s * 2(e-1023) * (1.f)
  • If e = 0 and f <> 0, then v = (-1)s * 2(-1022) * (0.f)
  • If e = 0 and f = 0, then v = (-1)s * 0
  • If e = 2047 and f = 0, then v = (-1)s * Inf
  • If e = 2047 and f <> 0, then v is a NaN

The Extended type

On 64-bit platforms, the Extended type is an alias for Double, which is only 8 bytes. (See the The Double type section above). For more information, see Delphi Considerations for Cross-Platform Applications.

On 32-bit platforms, an Extended number is represented as 10 bytes (80 bits). An Extended number is divided into four fields.

1

         15         

1

                                  63                                  

s

         e

i

                                  f


The value v of the number is given by:

  • If 0 <= e < 32767, then v = (-1)s * 2(e-16383) * (i.f)
  • If e = 32767 and f = 0, then v = (-1)s * Inf
  • If e = 32767 and f <> 0, then v is a NaN

The Comp type

An 8-byte (64-bit) Comp number is stored as a signed 64-bit integer.

The Currency type

An 8-byte (64-bit) Currency number is stored as a scaled and signed 64-bit integer with the 4 least significant digits implicitly representing 4 decimal places.

Pointer Types

On 32-bit platforms, a pointer type is stored in 4 bytes as a 32-bit address. On 64-bit platforms, a pointer type is stored in 8 bytes as a 64-bit address.

The pointer value nil is stored as zero.

Short String Types

A string occupies as many bytes as its maximum length plus one. The first byte contains the current dynamic length of the string, and the following bytes contain the characters of the string.

The length byte and the characters are considered unsigned values. The maximum string length is 255 characters plus a length byte (string[255]).

Long String Types

A string variable of type UnicodeString or AnsiString occupies 4 bytes of memory that contain a pointer to a dynamically allocated string. When a string variable is empty (contains a zero-length string), the string pointer is nil and no dynamic memory is associated with the string variable. For a nonempty string value, the string pointer points to a dynamically allocated block of memory that contains the string value in addition to information describing the string. The table below shows the layout of a long-string memory block.

String dynamic memory layout (Win32 only)

Offset Contents

-12

16-bit codepage of string data

-10

16-bit element size of string data

-8

32-bit reference-count

-4

Length in bytes

0..Length - 1

Character string of element sized data

Length*Element Size

NULL character


The NULL character at the end of a string memory block is automatically maintained by the compiler and the built-in string handling routines. This makes it possible to typecast a string directly to a null-terminated string.

For string literals, the compiler generates a memory block with the same layout as a dynamically allocated string, but with a reference count of -1. String constants are treated the same way, the only difference from literals being that they are a pointer to a -1 reference counter block.

When a pointer to a string structure (source) is assigned to a string variable (destination), the reference counter dictates how this is done. Usually, the reference count is decreased for the destination and increased for the source, as both pointers, source and destination, will point to the same memory block after the assignment.

If the source reference count is -1 (string constant), a new structure is created with a reference count of 1. If the destination is not nil, the reference counter is decreased. If it reaches 0, the structure is deallocated from the memory. If the destination is nil, no additional actions are taken for it. The destination will then point to the new structure.

var
 destination : String;
 source : String;
...
destination := 'qwerty';  // reference count for the newly-created block of memory (containing the 'qwerty' string) pointed at by the "destination" variable is now 1
...
source := 'asdfgh'; // reference count for the newly-created block of memory (containing the 'asdfgh' string) pointed at by the "destination" variable is now 1
destination := source; // reference count for the memory block containing the 'asdfgh' string is now 2, and since reference count for the block of memory containing the 'qwerty' string is now 0, the memory block is deallocated.

If the source reference count is not -1, it is incremented and the destination will point to it.

var
  destination, destination2, destination3: String;
  destination := 'Sample String'; //reference count for the newly-created block of memory containing 'Sample string' is 1.
  destination2 := destination; //reference count for the block of memory containing 'Sample string' is now 2.
  destination3 := destination; //reference count for the block of memory containing 'Sample string' is now 3.
Note: No string variable can point to a structure with a reference count of 0. Structures are always deallocated when they reach 0 reference count and cannot be modified when they have -1 reference count.

Wide String Types

On Win32, a wide string variable occupies 4 bytes of memory that contain a pointer to a dynamically allocated string. When a wide string variable is empty (contains a zero-length string), the string pointer is nil and no dynamic memory is associated with the string variable. For a nonempty string value, the string pointer points to a dynamically allocated block of memory that contains the string value in addition to a 32-bit length indicator. The table below shows the layout of a wide string memory block on Windows.

Wide string dynamic memory layout (Win32 only)

Offset Contents

-4

32-bit length indicator (in bytes)

0..Length -1

Character string

Length

NULL character


The string length is the number of bytes, so it is twice the number of wide characters contained in the string.

The NULL character at the end of a wide string memory block is automatically maintained by the compiler and the built-in string handling routines. This makes it possible to typecast a wide string directly to a null-terminated string.

Set Types

A set is a bit array where each bit indicates whether an element is in the set or not. The maximum number of elements in a set is 256, so a set never occupies more than 32 bytes. The number of bytes occupied by a particular set is equal to

(Max div 8) - (Min div 8) + 1

where Max and Min are the upper and lower bounds of the base type of the set. The byte number of a specific element E is

(E div 8) - (Min div 8)

and the bit number within that byte is

E mod 8

where E denotes the ordinal value of the element. When possible, the compiler stores sets in CPU registers, but a set always resides in memory if it is larger than the generic integer type or if the program contains code that takes the address of the set.

Static Array Types

On the Win32 platform, a static array is stored as a contiguous sequence of variables of the component type of the array. The components with the lowest indexes are stored at the lowest memory addresses. A multidimensional array is stored with the rightmost dimension increasing first.

Dynamic Array Types

On the Win32 platform, a dynamic-array variable occupies 4 bytes of memory that contain a pointer to the dynamically allocated array. When the variable is empty (uninitialized) or holds a zero-length array, the pointer is nil and no dynamic memory is associated with the variable. For a nonempty array, the variable points to a dynamically allocated block of memory that contains the array in addition to a 32-bit length indicator and a 32-bit reference count. The table below shows the layout of a dynamic-array memory block.

Dynamic array memory layout (Win32 only)

Offset Contents

-8

32-bit reference-count

-4

32-bit length indicator (number of elements)

0..Length * (size of element) -1

Array elements

Record Types

When a record type is declared in the {$A+} state (the default), and when the declaration does not include a packed modifier, the type is an unpacked record type, and the fields of the record are aligned for efficient access by the CPU, and according to the platform. The alignment is controlled by the type of each field. Every data type has an inherent alignment, which is automatically computed by the compiler. The alignment can be 1, 2, 4, or 8, and represents the byte boundary on which a value of the type must be stored in order to provide the most efficient access. The table below lists the alignments for all data types.

Type alignment masks (Win32 only)

Type Alignment

Ordinal types

Size of the type (1, 2, 4, or 8)

Real types

2 for Real48, 4 for Single, 8 for Double and Extended

Short string types

1

Array types

Same as the element type of the array

Record types

The largest alignment of the fields in the record

Set types

Size of the type if 1, 2, or 4, otherwise 1

All other types

Determined by the $A directive


To ensure proper alignment of the fields in an unpacked record type, the compiler inserts an unused byte before fields with an alignment of 2, and up to 3 unused bytes before fields with an alignment of 4, if required. Finally, the compiler rounds the total size of the record upward to the byte boundary specified by the largest alignment of any of the fields.

Implicit Packing of Fields with a Common Type Specification

Earlier versions of the Delphi compiler, such as Delphi 7 and earlier, implicitly applied packed alignment to fields that were declared together, that is, fields that have a common type specification. Newer compilers can reproduce the behavior if you specify the directive {$OLDTYPELAYOUT ON}. This directive byte-aligns (packs) the fields that have a common type specification, even if the declaration does not include the packed modifier and the record type is not declared in the {$A-} state.

Thus, for example, given the following declaration:

 {$OLDTYPELAYOUT ON}
 type
   TMyRecord = record
     A, B: Extended;
     C: Extended;
   end;
 {$OLDTYPELAYOUT OFF}

A and B are packed (aligned on byte boundaries) because the {$OLDTYPELAYOUT ON} directive is specified and because A and B share the same type specification. However, for the separately declared C field, the compiler uses the default behavior and pads the structure with unused bytes to ensure the field appears on a quadword boundary.

When a record type is declared in the {$A-} state, or when the declaration includes the packed modifier, the fields of the record are not aligned, but are instead assigned consecutive offsets. The total size of such a packed record is simply the size of all the fields. Because data alignment can change, it is a good idea to pack any record structure that you intend to write to disk or pass in memory to another module compiled using a different version of the compiler.

File Types

On the Win32 platform, file types are represented as records. Typed files and untyped files occupy 592 bytes, which are laid out as follows:

 type
   TFileRec = packed record
     Handle: Integer;
     Mode: word;
     Flags: word;
     case Byte of
       0: (RecSize: Cardinal);
       1: (BufSize: Cardinal;
    	   BufPos: Cardinal;
    	   BufEnd: Cardinal;
    	   BufPtr: PChar;
    	   OpenFunc: Pointer;
    	   InOutFunc: Pointer;
    	   FlushFunc: Pointer;
    	   CloseFunc: Pointer;
    	   UserData: array[1..32] of Byte;
    	   Name: array[0..259] of Char; );
  end;

Text files occupy 848 bytes, which are laid out as follows:

 type
   TTextBuf = array[0..127] of Char;
   TTextRec = packed record
     Handle: Integer;
     Mode: word;
     Flags: word;
     BufSize: Cardinal;
     BufPos: Cardinal;
     BufEnd: Cardinal;
     BufPtr: PChar;
     OpenFunc: Pointer;
     InOutFunc: Pointer;
     FlushFunc: Pointer;
     CloseFunc: Pointer;
     UserData: array[1..32] of Byte;
     Name: array[0..259] of Char;
     Buffer: TTextBuf;
  end;

Handle contains the file's handle (when the file is open).

The Mode field can assume one of the values:

 const
   fmClosed = $D7B0;
   fmInput= $D7B1;
   fmOutput = $D7B2;
   fmInOut= $D7B3;

where fmClosed indicates that the file is closed, fmInput and fmOutput indicate a text file that has been reset (fmInput) or rewritten (fmOutput), fmInOut indicates a typed or untyped file that has been reset or rewritten. Any other value indicates that the file variable is not assigned (and hence not initialized).

The UserData field is available for user-written routines to store data in.

Name contains the file name, which is a sequence of characters terminated by a null character (#0).

For typed files and untyped files, RecSize contains the record length in bytes, and the Private field is unused but reserved.

For text files, BufPtr is a pointer to a buffer of BufSize bytes, BufPos is the index of the next character in the buffer to read or write, and BufEnd is a count of valid characters in the buffer. OpenFunc, InOutFunc, FlushFunc, and CloseFunc are pointers to the I/O routines that control the file; see Device functions. Flags determines the line break style as follows.

bit 0 clear

LF line breaks

bit 0 set

CRLF line breaks

All other Flags bits are reserved for future use.

Note: For using the UnicodeString type (the default Delphi string type), the various stream types in the Classes unit (TFileStream, TStreamReader, TStreamWriter, and so forth) are more useful, since the older file types have limited Unicode functionality, particularly the old text file type.

Procedural Types

On the Win32 platform, a procedure pointer is stored as a 32-bit pointer to the entry point of a procedure or function. A method pointer is stored as a 32-bit pointer to the entry point of a method, followed by a 32-bit pointer to an object.

Class Types

On the Win32 platform, a class-type value is stored as a 32-bit pointer to an instance of the class, which is called an object. The internal data format of an object resembles that of a record. The object's fields are stored in order of declaration as a sequence of contiguous variables. Fields are always aligned, corresponding to an unpacked record type. Any fields inherited from an ancestor class are stored before the new fields defined in the descendent class.

The first 4-byte field of every object is a pointer to the virtual method table (VMT) of the class. There is exactly one VMT per class (not one per object); distinct class types, no matter how similar, never share a VMT. VMTs are built automatically by the compiler, and are never directly manipulated by a program. Pointers to VMTs, which are automatically stored by constructor methods in the objects they create, are also never directly manipulated by a program.

The layout of a VMT is shown in the following table. At positive offsets, a VMT consists of a list of 32-bit method pointersone per user-defined virtual method in the class typein order of declaration. Each slot contains the address of the corresponding virtual method's entry point. This layout is compatible with a C++ v-table and with COM. At negative offsets, a VMT contains a number of fields that are internal to Delphi's implementation. Applications should use the methods defined in TObject to query this information, since the layout is likely to change in future implementations of the Delphi language.

Virtual method table layout (Win32 Only)

Offset Type Description

-76

Pointer

Pointer to virtual method table (or nil)

-72

Pointer

Pointer to interface table (or nil)

-68

Pointer

Pointer to Automation information table (or nil)

-64

Pointer

Pointer to instance initialization table (or nil)

-60

Pointer

Pointer to type information table (or nil)

-56

Pointer

Pointer to field definition table (or nil)

-52

Pointer

Pointer to method definition table (or nil)

-48

Pointer

Pointer to dynamic method table (or nil)

-44

Pointer

Pointer to short string containing class name

-40

Cardinal

Instance size in bytes

-36

Pointer

Pointer to a pointer to ancestor class (or nil)

-32

Pointer

Pointer to entry point of SafecallException method (or nil)

-28

Pointer

Entry point of AfterConstruction method

-24

Pointer

Entry point of BeforeDestruction method

-20

Pointer

Entry point of Dispatch method

-16

Pointer

Entry point of DefaultHandler method

-12

Pointer

Entry point of NewInstance method

-8

Pointer

Entry point of FreeInstance method

-4

Pointer

Entry point of Destroy destructor

0

Pointer

Entry point of first user-defined virtual method

4

Pointer

Entry point of second user-defined virtual method

Class Reference Types

On the Win32 platform, a class-reference value is stored as a 32-bit pointer to the virtual method table (VMT) of a class.

Variant Types

The following discussion of the internal layout of variant types applies to the Win32 platform only. Variants rely on boxing and unboxing of data into an object wrapper, as well as Delphi helper classes to implement the variant-related RTL functions.

On the Win32 platform, a variant is stored as a 16-byte record that contains a type code and a value (or a reference to a value) of the type given by the code. The System and Variants units define constants and types for variants.

The TVarData type represents the internal structure of a Variant variable (on Windows, this is identical to the Variant type used by COM and the Win32 API). The TVarData type can be used in typecasts of Variant variables to access the internal structure of a variable. The TVarData record contains the following fields:

  • VType contains the type code of the variant in the lower 12 bits (the bits defined by the varTypeMask constant). In addition, the varArray bit may be set to indicate that the variant is an array, and the varByRef bit may be set to indicate that the variant contains a reference as opposed to a value.
  • The Reserved1, Reserved2, and Reserved3 fields are unused.

The contents of the remaining 8 bytes of a TVarData record depend on the VType field as follows:

  • If neither the varArray nor the varByRef bits are set, the variant contains a value of the given type.
  • If the varArray bit is set, the variant contains a pointer to a TVarArray structure that defines an array. The type of each array element is given by the varTypeMask bits in the VType field.
  • If the varByRef bit is set, the variant contains a reference to a value of the type given by the varTypeMask and varArray bits in the VType field.

The varString type code is private. Variants containing a varString value should never be passed to a non-Delphi function. On Win32, Delphi's Automation support automatically converts varString variants to varOleStr variants before passing them as parameters to external functions.

See Also

Personal tools
Previous Versions
In other languages