String Constants
Go Up to Constants Overview Index
Contents
String Literals
String constants, also known as string literals, form a special category of constants used to handle fixed sequences of characters. A string literal is of data type array-of- const char and storage class static, written as a sequence of any number of characters surrounded by double quotes:
"This is literally a string!"
The null (empty) string is written "".
The characters inside the double quotes can include escape sequences. This code, for example:
"\t\t\"Name\"\\\tAddress\n\n"
prints like this:
"Name"\ Address
"Name" is preceded by two tabs; Address is preceded by one tab. The line is followed by two new lines. The \" provides interior double quotes.
If you compile with the -A option for ANSI compatibility, the escape character sequence "\\" is translated to "\" by the compiler.
A literal string is stored internally as the given sequence of characters plus a final null character ('\0'). A null string is stored as a single '\0' character.
Four Types of String Literals in C++11
By default, string literals are ANSI strings containing char characters. You can use the L, u, and U prefixes, before string literals, to specify that string literals should contain wide-characters or Unicode characters (Unicode Character Types and Literals (C++11)):
- A string literal preceded immediately by an L is a wide-character string containing characters of the wchar_t data type. When wchar_t is used in a C program, it is a type defined in stddef.h header file. In C++ programs, wchar_t is a keyword. The memory allocation for wchar_t strings is two bytes per character. The value of a single wide-character is that character's encoding in the execution wide-character set.
- In C++11 programs, a string literal preceded immediately by an u character is a Unicode-character string containing characters of the char16_t data type. In C++11 programs, char16_t is a keyword declaring a 16-bit character type. char16_t defines UTF-16 character encoding for Unicode. The memory allocation for char16_t characters is two or four bytes per character.
- In C++11 programs, a string literal preceded immediately by an U character is a Unicode-character string containing characters of the char32_t data type. In C++11 programs, char32_t is a keyword declaring a 32-bit character type. char32_t defines UTF-32 character encoding for Unicode. The memory allocation for char32_t characters is four bytes per character.
That is, in C++11 programs, we can use the following four types of string literals:
"ANSI string"- this is an ANSI string literal containing char characters;
L"Wide-character string"- this string literal contains wchar_t characters;
u"UTF-16 string"- this string literal contains char16_t Unicode characters in UTF-16 encoding;
U"UTF-32 string"- this string literal contains char32_t Unicode characters in UTF-32 encoding;
Concatenating String Literals
You can use the backslash (\) as a continuation character to extend a string constant across line boundaries:
puts("This is really \
a one-line string");
Adjacent string literals separated only by whitespace are concatenated during the parsing phase. In the following example,
#include <stdio.h>
int main() {
char *p;
p = "This is an example of how the compiler "
" will \nconcatenate very long strings for you"
" automatically, \nresulting in nicer" " looking programs.";
printf(p);
return(0);
}
The output of the program is
This is an example of how the compiler will concatenate very long strings for you automatically, resulting in nicer looking programs.
See Also
- Constants
- Integer Constants
- Floating Point Constants
- Character Constants
- The Three char Types
- Escape Sequences
- Wide-character And Multi-character Constants
- Unicode Character Types and Literals (C++11)
- Enumeration Constants
- Constants And Internal Representation
- Internal Representation Of Numerical Types
- Constant Expressions
- String Constants (C++)
- FMXStringHandling (C++)