Character Constants

From RAD Studio
Jump to: navigation, search

Go Up to Character Constants Overview Index

A character constant is one or more characters enclosed in single quotes, such as 'A', '+', or '\n'. In C, single-character constants have data type int. In C++, a character constant has type char. Multi-character constants in both C and C++ have data type int.

To learn more about character constants, see the following topics: The Three char Types, Escape Sequences, Wide-character And Multi-character Constants, Unicode Character Types and Literals (C++11).

Sizes of Character Types in C and C++

To compare sizes of character types, compile this as a C program and then as a C++ program.

#include <stdio.h>
#define CH 'x'       /* A CHARACTER CONSTANT */
void main(void) {
  char ch = 'x';    /* A char VARIABLE      */
  printf("\nSizeof int     = %d", sizeof(int) );
  printf("\nSizeof char    = %d", sizeof(char) );
  printf("\nSizeof ch      = %d", sizeof(ch) );
  printf("\nSizeof CH      = %d", sizeof(CH) );
  printf("\nSizeof wchar_t = %d", sizeof(wchar_t) );
}

Note: Sizes are in bytes.

Sizes of character types in C and C++:

Output when compiled as C program Output when compiled as C++ program

Sizeof int = 4

Sizeof int = 4

Sizeof char = 1

Sizeof char = 1

Sizeof ch = 1

Sizeof ch = 1

Sizeof CH = 4

Sizeof CH = 1

Sizeof wchar_t = 2

Sizeof wchar_t = 2


Four Types of Character Literals in C++11

By default, a character literal in C++ contains an ANSI character of the char data type. In C++ and C++11, you can use the L prefix, before a character literal, to specify that the character literal should contain a wide-character of the wchar_t data type. In C++11, you can use the u and U prefixes, before a character literal, to specify that the character literal should contain Unicode characters in UTF-16 (the char16_t data type) or UTF-32 (the char32_t data type) encoding (Unicode Character Types and Literals (C++11)).

For example in C++11 programs, you can use the following character literals:

  • 'A' - this is the ANSI character of the char data type. This character literal allocates one byte of memory.
  • L'A' - this is the wide-character of the wchar_t data type. This character literal allocates two bytes of memory.
  • u'A' - this is the UTF-16 encoded Unicode character of the char16_t data type. This character literal allocates two or four bytes of memory (depending whether this character belongs to the Basic Multilingual Plane).
  • U'A' - this is the UTF-32 encoded Unicode character of the char32_t data type. This character literal allocates four bytes of memory.

See Also