System.AnsiString

From RAD Studio API Documentation
Jump to: navigation, search

Delphi

type AnsiString = { built-in type };

C++

typedef  AnsiStringT<0> AnsiString;

Properties

Type Visibility Source Unit Parent
type
typedef
public
System.pas
sysmac.h
System System

Description

Represents a dynamically allocated string whose maximum length is limited only by available memory.

An AnsiString variable is a structure containing string information. When the variable is empty (when it contains a zero-length string), the pointer is nil and the string uses no additional storage. When the variable is nonempty, it points to a dynamically allocated block of memory that contains the string value. This memory is allocated on the heap, but its management is entirely automatic and requires no user code. The AnsiString structure contains a 32-bit length indicator, a 32-bit reference count, a 16-bit data length indicating the number of bytes per character, and a 16-bit code page. This code page is set, by default, to the operating system's code page. It can be changed by calling SetMultiByteConversionCodePage.

An AnsiString represents a single-byte string. With a single-byte character set (SBCS), each byte in a string represents one character. In a multibyte character set (MBCS), the elements are still single bytes, but some characters are represented by one byte and others by more than one byte. Multibyte character sets--especially double-byte character sets (DBCS)--are widely used for Asian languages. An AnsiString can contain MBCS characters.

Indexing of AnsiString is 1-based. Indexing multibyte strings is not reliable, because S[i] represents the i-th byte (not necessarily the i-th character) in S. The i-th byte may be a single character or part of a character. However, the standard AnsiString string-handling functions have multibyte-enabled counterparts that also implement locale-specific ordering for characters. (Names of multibyte functions usually start with Ansi-. For example, the multibyte version of StrPos is AnsiStrPos.) Multibyte character support is operating-system dependent and based on the current locale.

Because AnsiString variables have pointers, two or more of them can reference the same value without consuming additional memory. The compiler exploits this to conserve resources and execute assignments faster. Whenever an AnsiString variable is destroyed or assigned a new value, the reference count of the old AnsiString (the variable's previous value) is decremented and the reference count of the new value (if there is one) is incremented; if the reference count of a string reaches zero, its memory is deallocated. This process is called reference counting. When indexing is used to change the value of a single character in a string, a copy of the string is made if--but only if-- its reference count is greater than one. This is called copy-on-write semantics.

Note: AnsiString is used by the Delphi desktop and mobile compilers, for more information see Migrating Delphi Code to Mobile from Desktop.

When a literal is assigned to an AnsiString, the compiler will convert that literal to Unicode using the code page of the AnsiString and then convert that back to literal. This ensures that the AnsiString contains characters valid for its codepage. If an invalid character is specified, it will be converted to byte $3F (question mark) to signal that an invalid byte sequence(s) was encountered.

Bear in mind that a byte sequence invalid for one code page may be valid for another.

Here is an example using codepage 936:

type
  AnsiStr936 = type AnsiString(936);
 
procedure TForm1.Button1Click(Sender: TObject);
begin
  const MyAnsiStr = AnsiStr936(#$20#$20#$F8#$20#$20);
  var str: string;
  for var Ch: AnsiChar in MyAnsiStr do
    str := str+Byte(Ch).ToHexString+' ';
  ShowMessage(str);
end;

After running this, the code will show: 20 20 3F 20 20.

Note that the character #$F8 was replaced with #$3F to signal that hex byte F8 is not valid for codepage 936.

See Also