String Constants
Go Up to Constants Overview Index
Contents
String Literals
String constants, also known as string literals, form a special category of constants used to handle fixed sequences of characters. A string literal is of data type array-of- const char and storage class static, written as a sequence of any number of characters surrounded by double quotes:
"This is literally a string!"
The null (empty) string is written ""
.
The characters inside the double quotes can include escape sequences. This code, for example:
"\t\t\"Name\"\\\tAddress\n\n"
prints like this:
"Name"\ Address
"Name"
is preceded by two tabs; Address
is preceded by one tab. The line is followed by two new lines. The \"
provides interior double quotes.
If you compile with the -A option for ANSI compatibility, the escape character sequence "\\"
is translated to "\"
by the compiler.
A literal string is stored internally as the given sequence of characters plus a final null character ('\0'
). A null string is stored as a single '\0'
character.
Four Types of String Literals in C++11
By default, string literals are ANSI strings containing char characters. You can use the L
, u
, and U
prefixes, before string literals, to specify that string literals should contain wide-characters or Unicode characters (Unicode Character Types and Literals (C++11)):
- A string literal preceded immediately by an L is a wide-character string containing characters of the wchar_t data type. When wchar_t is used in a C program, it is a type defined in stddef.h header file. In C++ programs, wchar_t is a keyword. The memory allocation for wchar_t strings is two bytes per character. The value of a single wide-character is that character's encoding in the execution wide-character set.
- In C++11 programs, a string literal preceded immediately by an u character is a Unicode-character string containing characters of the char16_t data type. In C++11 programs, char16_t is a keyword declaring a 16-bit character type. char16_t defines UTF-16 character encoding for Unicode. The memory allocation for char16_t characters is two or four bytes per character.
- In C++11 programs, a string literal preceded immediately by an U character is a Unicode-character string containing characters of the char32_t data type. In C++11 programs, char32_t is a keyword declaring a 32-bit character type. char32_t defines UTF-32 character encoding for Unicode. The memory allocation for char32_t characters is four bytes per character.
That is, in C++11 programs, we can use the following four types of string literals:
"ANSI string"
- this is an ANSI string literal containing char characters;
L"Wide-character string"
- this string literal contains wchar_t characters;
u"UTF-16 string"
- this string literal contains char16_t Unicode characters in UTF-16 encoding;
U"UTF-32 string"
- this string literal contains char32_t Unicode characters in UTF-32 encoding;
Concatenating String Literals
You can use the backslash (\
) as a continuation character to extend a string constant across line boundaries:
puts("This is really \ a one-line string");
Adjacent string literals separated only by whitespace are concatenated during the parsing phase. In the following example,
#include <stdio.h> int main() { char *p; p = "This is an example of how the compiler " " will \nconcatenate very long strings for you" " automatically, \nresulting in nicer" " looking programs."; printf(p); return(0); }
The output of the program is
This is an example of how the compiler will concatenate very long strings for you automatically, resulting in nicer looking programs.
See Also
- Constants
- Integer Constants
- Floating Point Constants
- Character Constants
- The Three char Types
- Escape Sequences
- Wide-character And Multi-character Constants
- Unicode Character Types and Literals (C++11)
- Enumeration Constants
- Constants And Internal Representation
- Internal Representation Of Numerical Types
- Constant Expressions
- String Constants (C++)
- FMXStringHandling (C++)