System.AnsiString

From RAD Studio API Documentation
Jump to: navigation, search

Delphi

type AnsiString = { built-in type };

C++

typedef  AnsiStringT<0> AnsiString;

Properties

Type Visibility Source Unit Parent
type
typedef
public
System.pas
sysmac.h
System System

Description

AnsiString represents a dynamically allocated string whose maximum length is limited only by available memory.

It is important to notice that AnsiString type has different behaviour in Delphi and C++:

  • In Delphi, it is a built-in type.
  • In C++, it is an alias of AnsiStringT<0>.
Note: All the information below is suitable for Delphi and C++. But for information specific to C++, please review AnsiStringT.

AnsiString type is a structure containing string information, its behavior depends on the variable's value:

  • empty, when it contains a zero-length string, the pointer is nil and the string uses no additional storage.
  • nonempty, it points to a dynamically allocated block of memory that contains the string value.

Also, AnsiString represents a single-byte string. With a single-byte character set (SBCS), each byte in a string represents one character. In a multibyte character set (MBCS), the elements are still single bytes, but some characters are represented by one byte and others by more than one byte. Multibyte character sets--especially double-byte character sets (DBCS)--are widely used for Asian languages. An AnsiString can contain MBCS characters.

Note: Names of multibyte functions usually start with Ansi-. For example, the multibyte version of StrPos is AnsiStrPos.

AnsiString type has the following characteristics:

  • Its memory is allocated on the heap, but its management is entirely automatic and requires no user code.
  • Its structure contains a 32-bit length indicator, a 32-bit reference count, a 16-bit data length indicating the number of bytes per character, and a 16-bit code page.
  • The code page is set, by default, to the operating system's code page. It can be changed by calling SetMultiByteConversionCodePage.
  • Indexing of AnsiString is 1-based.
Note: Indexing multibyte strings is not reliable, because S[i] represents the i-th byte (not necessarily the i-th character) in S. The i-th byte may be a single character or part of a character.
However, the standard AnsiString string-handling functions have multibyte-enabled counterparts that also implement locale-specific ordering for characters. Multibyte character support is operating-system dependent and based on the current locale.

Because AnsiString variables have pointers, two or more of them can reference the same value without consuming additional memory. The compiler exploits this to conserve resources and execute assignments faster. According to this, consider the following AnsiString behavior:

  • Whenever an AnsiString variable is destroyed or assigned a new value, the reference count of the old AnsiString (the variable's previous value) is decremented and the reference count of the new value (if there is one) is incremented
  • When the reference count of a string reaches zero, its memory is deallocated. This process is called reference counting.
  • If indexing is used to change the value of a single character in a string, a copy of the string is made if--but only if-- its reference count is greater than one. This is called copy-on-write semantics.
Note: AnsiString is used by the Delphi desktop and mobile compilers. For more information, see Migrating Delphi Code to Mobile from Desktop.
  • When a literal is assigned to an AnsiString, the compiler will convert that literal to Unicode using the code page of the AnsiString and then convert that back to literal. This ensures that the AnsiString contains characters valid for its codepage.
  • If an invalid character is specified, it will be converted to byte $3F (question mark) to signal that an invalid byte sequence(s) was encountered.

Bear in mind that a byte sequence invalid for one code page may be valid for another.

Here is an example using codepage 936:

Delphi

type
  AnsiStr936 = type AnsiString(936);
 
procedure TForm1.Button1Click(Sender: TObject);
begin
  const MyAnsiStr = AnsiStr936(#$20#$20#$F8#$20#$20);
  var str: string;
  for var Ch: AnsiChar in MyAnsiStr do
    str := str+Byte(Ch).ToHexString+' ';
  ShowMessage(str);
end;

C++

typedef AnsiStringT<936> AnsiStr936;
 
void __fastcall TForm19::Button1Click(TObject *Sender)
{
  AnsiStr936 MyAnsiStr("\x20\x20\xF8\x20\x20");
  String str;
  str.SetLength(MyAnsiStr.Length()*2);
  BinToHex(MyAnsiStr.c_str(), str.c_str(), MyAnsiStr.Length()*sizeof(AnsiChar));
  ShowMessage(str);
}

After running this, the code will show: 20 20 3F 20 20.

Note that the character #$F8 was replaced with #$3F to signal that hex byte F8 is not valid for codepage 936.

See Also