You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -60,7 +60,7 @@ String literals can have no prefix, or `u8`, `L`, `u`, and `U` prefixes to deno
60
60
61
61
## Character literals
62
62
63
-
A *character literal* is composed of a constant character. It is represented by the character surrounded by single quotation marks. There are five kinds of character literals:
63
+
A *character literal* is composed of a constant character. It's represented by the character surrounded by single quotation marks. There are five kinds of character literals:
64
64
65
65
- Ordinary character literals of type **char**, for example `'a'`
66
66
@@ -78,19 +78,19 @@ The character used for a character literal may be any character, except for the
78
78
79
79
Character literals are encoded differently based their prefix.
80
80
81
-
- A character literal without a prefix is an ordinary character literal. The value of an ordinary character literal containing a single character, escape sequence, or universal character name that can be represented in the execution character set has a value equal to the numerical value of its encoding in the execution character set. An ordinary character literal that contains more than one character, escape sequence, or universal character name is a *multicharacter literal*. A multicharacter literal or an ordinary character literal that can't be represented in the execution character set has type **int**, and its value is implementation-defined. For MSVC, see the **Microsoftspecific** section below.
81
+
- A character literal without a prefix is an ordinary character literal. The value of an ordinary character literal containing a single character, escape sequence, or universal character name that can be represented in the execution character set has a value equal to the numerical value of its encoding in the execution character set. An ordinary character literal that contains more than one character, escape sequence, or universal character name is a *multicharacter literal*. A multicharacter literal or an ordinary character literal that can't be represented in the execution character set has type **int**, and its value is implementation-defined. For MSVC, see the **Microsoft-specific** section below.
82
82
83
-
- A character literal that begins with the `L` prefix is a wide-character literal. The value of a wide-character literal containing a single character, escape sequence, or universal character name has a value equal to the numerical value of its encoding in the execution wide-character set unless the character literal has no representation in the execution wide-character set, in which case the value is implementation-defined. The value of a wide-character literal containing multiple characters, escape sequences, or universal character names is implementation-defined. For MSVC, see the **Microsoftspecific** section below.
83
+
- A character literal that begins with the `L` prefix is a wide-character literal. The value of a wide-character literal containing a single character, escape sequence, or universal character name has a value equal to the numerical value of its encoding in the execution wide-character set unless the character literal has no representation in the execution wide-character set, in which case the value is implementation-defined. The value of a wide-character literal containing multiple characters, escape sequences, or universal character names is implementation-defined. For MSVC, see the **Microsoft-specific** section below.
84
84
85
85
- A character literal that begins with the `u8` prefix is a UTF-8 character literal. The value of a UTF-8 character literal containing a single character, escape sequence, or universal character name has a value equal to its ISO 10646 code point value if it can be represented by a single UTF-8 code unit (corresponding to the C0 Controls and Basic Latin Unicode block). If the value can't be represented by a single UTF-8 code unit, the program is ill-formed. A UTF-8 character literal containing more than one character, escape sequence, or universal character name is ill-formed.
86
86
87
87
- A character literal that begins with the `u` prefix is a UTF-16 character literal. The value of a UTF-16 character literal containing a single character, escape sequence, or universal character name has a value equal to its ISO 10646 code point value if it can be represented by a single UTF-16 code unit (corresponding to the basic multi-lingual plane). If the value can't be represented by a single UTF-16 code unit, the program is ill-formed. A UTF-16 character literal containing more than one character, escape sequence, or universal character name is ill-formed.
88
88
89
89
- A character literal that begins with the `U` prefix is a UTF-32 character literal. The value of a UTF-32 character literal containing a single character, escape sequence, or universal character name has a value equal to its ISO 10646 code point value. A UTF-32 character literal containing more than one character, escape sequence, or universal character name is ill-formed.
90
90
91
-
### <aname="bkmk_Escape"></a> Escape sequences
91
+
### <aname="bkmk_Escape"></a> Escape sequences
92
92
93
-
There are three kinds of escape sequences: simple, octal, and hexadecimal. Escape sequences may be any of the following:
93
+
There are three kinds of escape sequences: simple, octal, and hexadecimal. Escape sequences may be any of the following values:
The backslash character (\\) is a line-continuation character when it's placed at the end of a line. If you want a backslash character to appear as a character literal, you must type two backslashes in a row (`\\`). For more information about the line continuation character, see [Phases of Translation](../preprocessor/phases-of-translation.md).
139
146
140
-
**Microsoft specific**
147
+
#### Microsoft-specific
141
148
142
149
To create a value from a narrow multicharacter literal, the compiler converts the character or character sequence between single quotes into 8-bit values within a 32-bit integer. Multiple characters in the literal fill corresponding bytes as needed from high-order to low-order. The compiler then converts the integer to the destination type following the usual rules. For example, to create a **char** value, the compiler takes the low-order byte. To create a **wchar_t** or `char16_t` value, the compiler takes the low-order word. The compiler warns that the result is truncated if any bits are set above the assigned byte or word.
### <aname="bkmk_UCN"></a> Universal character names
195
+
### <aname="bkmk_UCN"></a> Universal character names
189
196
190
197
In character literals and native (non-raw) string literals, any character may be represented by a universal character name. Universal character names are formed by a prefix `\U` followed by an eight-digit Unicode code point, or by a prefix `\u` followed by a four-digit Unicode code point. All eight or four digits, respectively, must be present to make a well-formed universal character name.
191
198
@@ -304,22 +311,22 @@ u32string str6{ UR"(She said "hello.")"s };
304
311
305
312
### Size of string literals
306
313
307
-
For ANSI `char*` strings and other single-byte encodings (but not UTF-8), the size (in bytes) of a string literal is the number of characters plus 1 for the terminating null character. For all other string types, the size is not strictly related to the number of characters. UTF-8 uses up to four **char** elements to encode some *code units*, and `char16_t` or `wchar_t` encoded as UTF-16 may use two elements (for a total of four bytes) to encode a single *code unit*. This example shows the size of a wide string literal in bytes:
314
+
For ANSI `char*` strings and other single-byte encodings (but not UTF-8), the size (in bytes) of a string literal is the number of characters plus 1 for the terminating null character. For all other string types, the size isn't strictly related to the number of characters. UTF-8 uses up to four **char** elements to encode some *code units*, and `char16_t` or `wchar_t` encoded as UTF-16 may use two elements (for a total of four bytes) to encode a single *code unit*. This example shows the size of a wide string literal in bytes:
Notice that `strlen()` and `wcslen()` don't include the size of the terminating null character, whose size is equal to the element size of the string type: one byte on a `char*` string, two bytes on `wchar_t*` or `char16_t*` strings, and four bytes on `char32_t*` strings.
321
+
Notice that `strlen()` and `wcslen()` don't include the size of the terminating null character, whose size is equal to the element size of the string type: one byte on a `char*`or `char8_t*`string, two bytes on `wchar_t*` or `char16_t*` strings, and four bytes on `char32_t*` strings.
315
322
316
323
The maximum length of a string literal is 65,535 bytes. This limit applies to both narrow string literals and wide string literals.
317
324
318
325
### Modifying string literals
319
326
320
327
Because string literals (not including `std::string` literals) are constants, trying to modify them—for example, `str[2] = 'A'`—causes a compiler error.
321
328
322
-
**Microsoft specific**
329
+
#### Microsoft-specific
323
330
324
331
In Microsoft C++, you can use a string literal to initialize a pointer to non-const **char** or **wchar_t**. This non-const initialization is allowed in C99 code, but is deprecated in C++98 and removed in C++11. An attempt to modify the string causes an access violation, as in this example:
You can cause the compiler to emit an error when a string literal is converted to a non-const character pointer when you set the [/Zc:strictStrings (Disable string literal type conversion)](../build/reference/zc-strictstrings-disable-string-literal-type-conversion.md) compiler option. We recommend it for standards-compliant portable code. It is also a good practice to use the **auto** keyword to declare string literal-initialized pointers, because it resolves to the correct (const) type. For example, this code example catches an attempt to write to a string literal at compile time:
338
+
You can cause the compiler to emit an error when a string literal is converted to a non-const character pointer when you set the [/Zc:strictStrings (Disable string literal type conversion)](../build/reference/zc-strictstrings-disable-string-literal-type-conversion.md) compiler option. We recommend it for standards-compliant portable code. It's also a good practice to use the **auto** keyword to declare string literal-initialized pointers, because it resolves to the correct (const) type. For example, this code example catches an attempt to write to a string literal at compile time:
332
339
333
340
```cpp
334
341
auto str = L"hello";
@@ -337,7 +344,7 @@ str[2] = L'a'; // C3892: you cannot assign to a variable that is const.
337
344
338
345
In some cases, identical string literals may be pooled to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. To enable string pooling, use the [/GF](../build/reference/gf-eliminate-duplicate-strings.md) compiler option.
339
346
340
-
**End Microsoftspecific**
347
+
The **Microsoft-specific** section ends here.
341
348
342
349
### Concatenating adjacent string literals
343
350
@@ -366,14 +373,14 @@ Using embedded hexadecimal escape codes to specify string literals can cause une
366
373
"\x05five"
367
374
```
368
375
369
-
The actual result is a hexadecimal 5F, which is the ASCII code for an underscore, followed by the characters i, v, and e. To get the correct result, you can use one of these:
376
+
The actual result is a hexadecimal 5F, which is the ASCII code for an underscore, followed by the characters i, v, and e. To get the correct result, you can use one of these escape sequences:
370
377
371
378
```cpp
372
379
"\005five"// Use octal literal.
373
380
"\x05""five"// Use string splicing.
374
381
```
375
382
376
-
`std::string` literals, because they are`std::string` types, can be concatenated with the `+` operator that is defined for [basic_string](../standard-library/basic-string-class.md) types. They can also be concatenated in the same way as adjacent string literals. In both cases, the string encoding and the suffix must match:
383
+
`std::string` literals, because they're`std::string` types, can be concatenated with the `+` operator that is defined for [basic_string](../standard-library/basic-string-class.md) types. They can also be concatenated in the same way as adjacent string literals. In both cases, the string encoding and the suffix must match:
377
384
378
385
```cpp
379
386
auto x1 = "hello"""" world"; // OK
@@ -384,7 +391,7 @@ auto x4 = u8"hello" " "s u8"world"z; // C3688, disagree on suffixes
384
391
385
392
### String literals with universal character names
386
393
387
-
Native (non-raw) string literals may use universal character names to represent any character, as long as the universal character name can be encoded as one or more characters in the string type. For example, a universal character name representing an extended character cannot be encoded in a narrow string using the ANSI code page, but it can be encoded in narrow strings in some multi-byte code pages, or in UTF-8 strings, or in a wide string. In C++11, Unicode support is extended by the `char16_t*` and `char32_t*` string types:
394
+
Native (non-raw) string literals may use universal character names to represent any character, as long as the universal character name can be encoded as one or more characters in the string type. For example, a universal character name representing an extended character can't be encoded in a narrow string using the ANSI code page, but it can be encoded in narrow strings in some multi-byte code pages, or in UTF-8 strings, or in a wide string. In C++11, Unicode support is extended by the `char16_t*` and `char32_t*` string types:
0 commit comments