Unicode Character Types and Literals (C++11)
Go Up to C++11 Features in the Classic Compiler
BCC32 implements new character types and character literals for Unicode. These types are among the C++11 features added to BCC32.
Contents
Character Types char16_t
and char32_t
Two new types represent Unicode characters:
- char16_t is a 16-bit character type. char16_t is a C++ keyword. This type can be used for UTF-16 characters.
- char32_t is a 32-bit character type. char32_t is a C++ keyword. This type can be used for UTF-32 characters.
The existing wchar_t type is a type for a wide character in the execution wide-character set. A wchar_t wide-character literal begins with an uppercase L (such as L'c'
).
Character Literals u'character'
and U'character'
There are two new forms to create character literals of the new types:
u'character'
is a literal for a single char16_t character, such asu'g'
. A multicharacter literal such asu'kh'
is badly formed. The value of a char16_t literal is equal to its ISO 10646 code point value, provided that the code point is representable as a 16-bit value. Only characters in the basic multilingual plane (BMP) can be represented.U'character'
is a literal for a single char32_t character, such asU't'
. A multicharacter literal such asU'de'
is ill-formed. The value of a char32_t literal is equal to its ISO 10646 code point value.
Multibyte character literals were previously only of the form L'characters'
, representing one or more characters of the type wchar_t. The value of a single character wide-character literal is that character's encoding in the execution wide-character set.
String Literals u"UTF-16_string"
and U"UTF-32_string"
There are two new forms to create string literals of the new types:
u"UTF-16_string"
is a string literal containing characters of the char16_t type, for exampleu"string_containing_UTF-16_encoding_characters"
.
U"UTF-32_string"
is a string literal containing characters of the char32_t type, for exampleU"string_containing_UTF-32_encoding_characters"
.