Unicode Character Types and Literals (C++11)

From RAD Studio
Jump to: navigation, search

Go Up to C++11 Features in the Classic Compiler

Attention: This page refers to a C++11 feature in the Classic compiler. The Classic compiler is not recommended: instead it is recommend you use the Clang-enhanced compilers, which support modern C++ including C++11, C++14 and C++17.

BCC32 implements new character types and character literals for Unicode. These types are among the C++11 features added to BCC32.

Character Types char16_t and char32_t

Two new types represent Unicode characters:

  • char16_t is a 16-bit character type. char16_t is a C++ keyword. This type can be used for UTF-16 characters.
  • char32_t is a 32-bit character type. char32_t is a C++ keyword. This type can be used for UTF-32 characters.

The existing wchar_t type is a type for a wide character in the execution wide-character set. A wchar_t wide-character literal begins with an uppercase L (such as L'c').

Character Literals u'character' and U'character'

There are two new forms to create character literals of the new types:

  • u'character' is a literal for a single char16_t character, such as u'g'. A multicharacter literal such as u'kh' is badly formed. The value of a char16_t literal is equal to its ISO 10646 code point value, provided that the code point is representable as a 16-bit value. Only characters in the basic multilingual plane (BMP) can be represented.
  • U'character' is a literal for a single char32_t character, such as U't'. A multicharacter literal such as U'de' is ill-formed. The value of a char32_t literal is equal to its ISO 10646 code point value.

Multibyte character literals were previously only of the form L'characters', representing one or more characters of the type wchar_t. The value of a single character wide-character literal is that character's encoding in the execution wide-character set.

String Literals u"UTF-16_string" and U"UTF-32_string"

There are two new forms to create string literals of the new types:

  • u"UTF-16_string" is a string literal containing characters of the char16_t type, for example u"string_containing_UTF-16_encoding_characters".
  • U"UTF-32_string" is a string literal containing characters of the char32_t type, for example U"string_containing_UTF-32_encoding_characters".

See Also