Detailed explanation of the format and vformat functions of C++

1 C++ string formatting dilemma

1.1 “Lengthy foot binding”

The advantage of the printf() series functions of the traditional C library is that the function calls are more natural, the formatted information is separated from the parameters, and the code structure is clear, but it has been criticized for security issues. C++ also recommends stream-based formatted input and output instead of the traditional C library's printf() series functions. Although it solves the security problem of printf() series functions, it is also a bit anti-human in terms of ease of use. Not to mention input and output, just talking about the formatting of strings is simply a "sad tear" for C++ users. To give a simple example, format a floating point number to string, retain 3 decimal places after the decimal point, and implement it with sprintf() is very simple:

char buf[64];
sprintf(buf, "%.3f", 1.0/3.0);  //buf Inside is 0.333

If you use C++, this is the style:

std::stringstream ss;
ss &lt;&lt; std::fixed &lt;&lt; std::setw(5) &lt;&lt; std::setprecision(3) &lt;&lt; 1.0/3.0;
std::string str = ();  // str yes 0.333

I have to say that this style of C++ is really "academic". If various operators that control IO streams are used to set questions and tests, they will definitely make the poor students worse than death. This example is just to format a floating point number. If you want to format multiple different types of data into a string, how many control characters are needed? Like a foot binding, lengthy and unintuitive. Most C++ programmers have to continue using sprintf() for formatting strings, but in addition to security issues, sprintf() also has the problem of insufficient type support. It only supports a few built-in types, not the various containers in the standard library, let alone user-defined types.

1.2 A small innovation in C++11

C++11 provides a new C-style string formatting function:

int snprintf(char* buffer, std::size_t buf_size, const char* format, ...);

In addition to the benefit of the buf_size parameter to help prevent buffer overflow, this function can also calculate the storage space required after text formatting of specified parameters:

const char *fmt = "sqrt(2) = %f";
int sz = std::snprintf(nullptr, 0, fmt, std::sqrt(2));
std::vector<char> buf(sz + 1); // note +1 for null terminator
std::snprintf(&buf[0], (), fmt, std::sqrt(2));

With the help of C++11's function parameter package (for the parameter package, please refer to the article "C++'s "..." and Variable Parameter List"), a text formatting function with C++ style can be implemented. On the * website, someone has given a solution like this:

template<typename ... Args>
std::string string_format(const std::string& format, Args ... args) {
    int size_s = std::snprintf(nullptr, 0, format.c_str(), args ...) + 1; // Extra space for '\0'
    if (size_s <= 0) { 
        throw std::runtime_error("Error during formatting."); 
    }
    auto size = static_cast<size_t>(size_s);
    std::unique_ptr<char[]> buf(new char[size]);
    std::snprintf((), size, format.c_str(), args ...);
    return std::string((), () + size - 1); // We don't want the '\0' inside
}
//using string_format()
std::string s = string_format("%s is %d years old.", "Emma", 5);

Although tortuous, it can still be used before your compiler is upgraded to C++ 20. However, it should be noted that although this function solves the overflow problem, the type safety problem still exists.

2 C++ 20 format function

2.1 format function

Although the format library in boost has been around for a long time, I don’t know whether it is the efficiency of this library or other reasons, and it has never been under the C++ Standards Committee’s attention. Many C++ programmers "cocked" with Python's format() function, and the emergence of the fmtlib library finally alleviated this desire. Better news is that part of fmtlib has entered the C++ 20 standard, such as the format() function and the vformat() function. Let’s take a look at some examples of using format() function:

std::string name("Bob");
auto result = std::format("Hello {}!", name);  // Hello Bob!
//03:15:30
result = std::format("Full Time Format: {:%H:%M:%S}\n", 3h + 15min + 30s);
//***3.14159
std::print("{:*>10.5f}", std::numbers::pi);

Is it very much like Python?

2.2 fmt standard format

The fmt parameter represents the format of string formatting, and is of type std::format_string (the initial version in the proposal was std::string_view, and later std::format_string, which is consistent with the std::print() function supplemented by C++ 23). Obviously, this "formatted string" has formatting. Generally speaking, except for the two special characters "{" and "}", other characters will be copied into the output result as is. "{" and "}" are special symbols representing formats. If you really need to output "{" and "}", you need to use escape sequences "{{" and "}}" instead. In fact, a pair of "{" and "}" symbols form a placeholder, and the syntax description of this placeholder is:

Among them, arg-id and format-spec are optional, that is, a pair of empty braces "{}" are also legal format strings:

auto result = std::format("{} is {} years old.", "Kitty", 5);  //Kitty is 5 years old.

In this case, each pair of braces corresponds one by one to the parameters in the parameter list represented by args in order. If there are more parameters in the args parameter list than the formatted placeholders in fmt, mismatched parameters will be ignored, but no error will be reported. Conversely, if the number of parameters in the args parameter list is less than the formatted placeholders in fmt, you need to pay attention to the compiler behavior. The P2216 proposal of information [9] has become the content of C++ 23, so the compiler of C++ 23 version will report a compilation error. For C++ 20 version compiler, it depends on its support for this proposal. Compilers that do not yet support P2216 will not report a compilation error, but the code will throw a format_error exception at runtime.

auto result = std::format("{} is {} years old.", "Kitty", 5, 43.67);  //It runs normally, 43.67 is ignoredauto result = std::format("{} is {} years old.", "Kitty");  //Depends on the compiler

2.2.1 Parameter (placeholder) index (arg-id)

If the positional relationship between placeholders and parameters needs to be emphasized in the format string, you need to specifyarg-idparameter. arg-id is used to specify the subscript of the formatted value represented by the placeholder in the parameter list args. for example:

std::string dogs{ "dogs" };
std::string emma{ "Emma" };
auto result = std::format("{0} loves {1}, but {1} don't love {0}.", emma, dogs);
//Emma loves dogs, but dogs don't love Emma.

It should be noted that the actual parameter indexes are either not used or all are specified. Format strings do not support partial use of indexes. For example, such code is wrong:

auto s = std::format("{1} is a good {}, but Dos is {0} !\n", 6.22, apple); //error

2.2.2 Format-spec

The format description is located to the right of the colon, and its syntax is:

fill-and-align(optional) sign(optional) #(optional) 0(optional) width(optional) precision(optional) L(optional) type(optional)

It looks a little complicated, but it is nothing more than filling, alignment, symbols, domain width, precision, etc. First, look at fill-and-align. The fill character can be any character except "{" and "}", followed by the alignment flag. The alignment flag is also a character. "<" means forced left alignment and ">" means forced right alignment. There are three types of alignment flags. "^" means center alignment. ⌊ n 2 ⌋ \lfloor \frac{n}{2} \rfloor ⌊2n⌋ padding characters will be inserted after the value ⌈ n 2 ⌉ \lceil \frac{n}{2} \rceil ⌈2n⌉ padding characters (note the rounding direction).

auto str = std::format("{:*&lt;7}", 42); // The value of str is "42******"auto str = std::format("{:*&gt;7}", 42); // The value of str is "***42"auto str = std::format("{:*^7}", 42); // The value of str is "**42***"

If the padding and alignment control characters are not specified, the default padding and alignment control is used. The default padding character is space, the default alignment control is right-aligned for numeric types, and the default alignment control is left-aligned for strings.

sign、#and0Format expressions for numbers (all optional), wheresignA symbol representing a number, "+" means adding a + sign before a non-negative number, "-" means adding a negative sign before a negative number, and a space " " means adding a space character before a non-negative number, and adding a negative sign before a negative number. Two points to note about symbols: First, adding a negative sign is the default behavior for negative numbers, that is, even if the sign flag is not specified, a negative sign will be added when outputting negative numbers. Secondly, if the value is a non-negative value, even if the "-" symbol flag is specified, it will be ignored. Similarly, for negative numbers, even if the "+" symbol flag is specified, a negative sign will be replaced when output.

auto s0 = std::format("{0:},{0:+},{0:-},{0: }", 1);   //"1,+1,1, 1"
auto s1 = std::format("{0:},{0:+},{0:-},{0: }", -1);  //"-1,-1,-1,-1"

#An alternative form used to convert output data. For integer types, in addition to the default decimal form, numeric values can also be expressed in binary, octal and hexadecimal forms. You can specify the representation of the numeric value by combining the type control character with #. For example, the type control character "d" represents decimal, which is also the default form of integer output. The type control character "b" represents binary, and two characters "0b" are inserted before the numeric value. type control "o" means octal, and a character "0" will be inserted before the value. type controls "x" to represent hexadecimal, and two characters "0x" will be inserted before the value. If the capital "X" character is used, two characters "0X" will be inserted. If the output domain width is specified in the format description, the characters in the alternative form must be followed by the domain width value, for example:

auto s = std::format("{0:#},{0:#6d},{0:#6b},{0:#6o},{0:#6x}", 10); //10,    10,0b1010,   012,   0xa

For floating point types, if it is a finite number and there is no significant number after the decimal point, such as 6.0, the decimal point will not be output by default. If you want to force output decimal points, you need to use#Control characters can also be combined with the "g" and "G" conversion characters to make up 0 after the decimal point, such as the following code:

auto s = std::format("{0:},{0:#},{0:#6g},{0:#6G}", 6.0); //6,6.,6.00000,6.00000

0Indicates that the number is filled with leading 0. For infinity and invalid values, this symbol is ignored and the leading 0 is not filled with leading 0. If the 0 character appears together with the alignment option, the 0 character is ignored. Let’s see a few examples:

auto s = std::format("{:+06d}", 12);   // The value of s is "+00012"auto s = std::format("{:#06x}", 10); // The value of s is "0x000a"auto s = std::format("{:&lt;06}", -42); // The value of s is "-42   " （because &lt; Alignment ignore 0 ）

widthandprecisionUsed to represent the domain width and accuracy of the number. width is a positive decimal number, often used with the alignment and fill control characters, and also with the 0 control characters. Related examples have been shown before. The form of precision is represented by a decimal point followed by a decimal number or a nested replacement placeholder. precision can only be used for floating point numbers or strings. For floating point numbers, it indicates the formatting accuracy of the output, and for strings, it indicates how many characters in the string are used. The type control character "f" is often used in conjunction with precision control. Let's take a look at a few examples:

float pi = 3.14f;
auto s = std::format("{:10f}", pi);           // s = "  3.140000" (width = 10)
auto s = std::format("{:.5f}", pi);           // s = "3.14000" (precision = 5)
auto s = std::format("{:10.5f}", pi);         // s = "   3.14000"
auto s = std::format("{:>10.5}", "Kitty loves cats!");         // s = "     Kitty"

If you feel that precision and width need to be written into formatted strings, which will cause inconvenience to use, then you are too underestimated by experts from the C++ Standards Committee. The format description part supports nested formatting placeholders, dynamically specifying relevant formatting parameters, such as this:

auto s = std::format("{:{}f}", pi, 10);       // s = "  3.140000" (width = 10)
auto s = std::format("{:.{}f}", pi, 5);       // s = "3.14000" (precision = 5)
auto s = std::format("{:{}.{}f}", pi, 10, 5); // s = "   3.14000" (width = 10, precision = 5)

In the above lines of code, 10 and 5 representing the accuracy in the parameter list can be directly specified, or it can be dynamic results calculated through other forms, which are very flexible in form. However, using nested locators, it is necessary to ensure that the corresponding parameters are positive integer types, otherwise the std::format() function will throw an exception (C++ 23 will report a compilation error, please refer to the contents of materials [9] and Section 4.1).

LControllers are used to introduce regionalized locales when formatting. This controllizer is only used for arithmetic types, such as text representations of integers, floating-point numbers and Boolean types of values. For integers, the appropriate digit group separator can be inserted according to the local locale. For floating point numbers, the appropriate digit groups and bottom separators are also inserted according to the local locale. For Boolean type text representations, it is consistent with the results obtained using std::numpunct::truename and std::numpunct::falsename . The Western habit of expressing numbers is to use "," as a number separator, but the Chinese environment does not have this habit. For example, if the following code changes the regional environment to the Western English environment, the format of the numbers will be different:

std::locale::global(std::locale("en_US"));  //Switch the locale to Western English environmentauto s = std::format("{0:12},{0:12L}", 432198409L); // s =    432198409, 432,198,409

typeControl characters are used to determine how the numerical values are displayed. When introducing the # control characters, we have introduced several control characters "o", "b", "d", "x", and "X". In fact, there are many control characters, such as "B", which has the same function as "b", but the number prefix uses two characters "0B". "e" and "E" are used to display floating point numbers in exponential form, "s" is used to output strings, "a" and "A" are used to display floating point numbers in hexadecimal (the letter p represents the exponent), etc.

2.3 Custom format

The power of the format library is also the support for user-defined type extensions. Users can implement support for custom types by providing specialization of std::formatter<> templates. In fact, C++ supports types in the standard library by providing the corresponding std::formatter<> template specialization, for example, the specialized version of the char type is:

template<> struct formatter<char, char>;

If you want to implement formatting rules for custom data types, you need to implement a specialized version of std::formatter<> for custom data types. For specific implementation methods, please refer to "C++ format function supports custom types".

3 std::vformat and std::format_args

The relationship between std::format() and std::vformat() is just like the relationship between sprintf() and vsprintf(). It is mainly used in scenarios where users custom functions with format parameters are used in conjunction with format libraries, while std::format_args is used to cooperate with parameter passing. It can be understood that std::vformat() is a type erased version of std::format(). In order to match the implementation of std::format(), it avoids the template bloating problem caused by the code being all in std::format(). It is generally not recommended to use the std::vformat() function directly. Section 4.2 also introduces the problems that may be caused by using std::vformat() directly before C++26. But in some cases, the std::vformat() function still works, such as the following example.

Before C++ provides the function parameter package, if the variadic parameter function wants to format the variable-length parameters output by the user with sprintf(), it needs to be implemented with va_list with vsnprintf() function. For example, this function that records the log is a typical usage:

void __cdecl DebugTracing(int nLevel, const char* fmt, ... ) {
	if(nLevel &gt;= g_Level) { //Control the level of logging		va_list args;
		va_start(args, fmt);
		int nBuf;
		char szBuffer[512];
		nBuf = _vsnprintf(szBuffer, sizeof(szBuffer)/sizeof(char), fmt, args);
		ASSERT(nBuf &lt; sizeof(szBuffer)); 
		LogMessage(szBuffer); //Log writing to system		va_end(args);
	}
}

Because … is not a specific parameter, the sprinf() function cannot be called directly. You can only parse the args parameter with the va_start() macro, and then call the vsprintf() function. The use of szBuffer[] in this function is actually terrifying. Putting an array of 512 bytes in the stack is an inappropriate design. If the function call chain level is deep, the stack may explode. In addition, 512 bytes may not be enough sometimes, and dynamically applying for memory is obviously troublesome.

For modern C++, the std::format_args parameter can be used with the std::vformat() function to safely implement this function:

void DebugTracing(int nLevel, const std::string_view&amp; fmt, std::format_args&amp;&amp; args) {
    if (nLevel &gt;= g_Level) { //Control the level of logging        std::string msg = std::vformat(fmt, args);
        LogMessage(msg); //Log writing to system    }
}

When using it, you can use the std::make_format_args() function to help construct args parameters, such as:

DebugTracing(5, "{0:<d}{1:<x}", std::make_format_args(34, 42));

In fact, the use of DebugTracing() function can be replaced by the function parameter package syntax to further simplify the use:

template &lt;typename... Args&gt;
void DebugTracing(int nLevel, const std::string_view&amp; fmt, Args&amp;&amp;... args) {
    if (nLevel &gt;= g_Level) { //Control the level of logging        std::string msg = std::vformat(fmt, std::make_format_args(args...));
        LogMessage(msg); //Log writing to system    }
}

This way, you don't need std::make_format_args():

DebugTracing(5, "{0:<d}{1:<x}", 34, 42);

4 Continuous Optimization of C++ 23 and 26

4.1 Improvements to C++ 23

C++ 23's main improvement to the format library is to support more standard library types, such as Ranges[5], thread::id, std::stacktrace, etc. Data [6] discusses and clarifies formatted output forms of common container types, such as:

map type: {k1: v1, k2: v2}
set type: {v1, v2}
General sequence container type: [v1, v2]

The std::print() function [4] that failed to catch the train in C++ 20 was also on the train in C++ 23. Although it only supports standard output streams and file streams, it already has the potential to replace standard input and output streams. It also solves the problem that the formatting of the time library is incompatible with format during localization processing. This has been introduced in the article "C++ Time Library Eight: Format and Format". In addition, when using the L format to specify the localized output, the question of which encoding format the format results are used is also clarified by the data [8], which is to use the unicode encoding format, rather than the system default format. This is very important. Taking Chinese as an example, the default format of Windows system is extended ASCII encoding represented by GB2312, GB18030, GBK, etc., while Linux uses UTF-8 encoding. If the library specifications are fuzzy in this place, it will bring hidden dangers to data exchange between programs.

Material [9] mainly improves some of the problems raised by LEWG, such as changing the formatted string type from basic_string_view string type to basic_format_string<charT, Args…> class template. The benefits of the improvement can be explained by this example:

auto s = std::format("{:d}", "I am not a number");

This line of code will throw a std::format_error exception at runtime in C++ 20, but after improvement, it can be checked for mismatch in parameter types during the compilation period, because the basic_format_string class contains parameter type information and can check for errors in formatting strings, which greatly improves the security of the format() function.

The improved value of the data [11] allows format to support non-const formatable types. The reason for the improvement is that in C++20 the format() function declaration looks like this:

template&lt;class... Args&gt;
string format(string_view fmt, const Args&amp;... args);
//About this is what the improvement is:template &lt;class... Args&gt;
string format(const format_string&lt;_Types...&gt; fmt, Args&amp;&amp;... args);

Note that its requirement for parameters is constant reference, that is, the parameters either have const or copyable types, which limits some usage scenarios, such as non-const iterable views, which will generate temporary objects at this time. This implicit temporary object copy will generate overhead that is not easy to detect. So the result of the proposal improvement is to use forward references instead, and use format_string<> to check the life cycle of the parameters.

Materials [12] and [13] mainly provide clear solutions to the tolerance and length estimation problems of fill characters. These problems are LWG issue 3576, LWG issue 3639 and LWG issue 3780. Interested readers can learn about the relevant situation through the problem links of Materials [12] and [13].

4.2 Improvements to C++ 26

C++ 26 has also improved a lot. Data [14] solved the problem of inconsistent formats of the to_string() function and the format() function output numbers. The adjusted output format of the to_string() function is consistent with the default format of the format() function. Information [15] introduces a very interesting question, let’s look at this line of code:

format("{:>{}}", "hello", "10")

In theory, C++ 23 already supports checking formatted strings. The above error can be reported during the compilation period, but in fact it is not. The above code just generates a format_error exception during the run period. We have introduced in Section 2.2 that the format() function format description supports nested forms using dynamic format parameters, so that the code specifies a dynamic width. In order, the next parameter, that is, "10" is the dynamic width, but obviously, "10" is not the integer type we need. In fact, according to the first parameter being the string type, the compiler should know that its width is an integer type, so here should be able to check the incorrect type corresponding to the dynamic width part. Therefore, the P2757 proposal requires that the formatting parameters also be type checked.

From C++ 20 to 26, so many data types are supported, how can we lose pointers? In essence, as an integer type, pointers can have multiple output formats, just output in hexadecimal format. However, as such a clear data type, pointers are always unhappy to be cast into integer types every time. Data [16] allows direct formatting of pointers into address form:

//Suppose uintptr_t is a predefined pointer typeint i = 0;
format("{:#018x}", reinterpret_cast<uintptr_t>(&i)); // Before P2510format("{:018}", &amp;i);// P2510 after

It's also great to be able to type the keyboard a few times less, right?

As mentioned earlier, Data [9] proposed many improvements in compilation period checking, but for strings, due to resource limitations, Data [9] does not provide a good API to use formatting functions unknown during the compilation period using format strings. As a workaround, you can use the type-erased API (std::vformat() function), but this seriously undermines runtime security. Information [18] It is recommended to introduce the fmtlib library ready-made runtime_format to provide runtime checks to avoid unsafe code breaking system security. Material [9] Another problem in introducing compile-time checking is that the formatted string must be a constant that can be evaluated during the compile-time or the return value of the immediate function, otherwise it will lead to compilation errors, such as:

std::string strfmt = translate("The answer is {}.");
auto s = std::format(strfmt, 42); // error

translate() is not an immediate function or a constant function, so strfmt is not a compile-time constant, using the format() function will result in a compile error. So everyone thought of the vformat() function, but this type of erased API is designed to avoid template bloating and is used by library or formatting function developers. However, the programmers of Amacao and Amagao were forced to do nothing, so they could only use it with their teeth. However, if the use is not appropriate, the error will come, such as the example on this information [17]:

std::string str = "{}";
std::filesystem::path path = "path/etic/experience";
auto args = std::make_format_args(());
std::string msg = std::vformat(str, args);

This seemingly "harmless" code implies UB because the formatting parameters hold a reference to a destroyed object. Therefore [17] It is recommended to change the parameters of the make_format_args() function from forwarding reference to lvalue reference to avoid the problem of using rvalues incorrectly.

Materials [19] mainly solves the problem that char is treated as an integer type. When char encounters d or x format descriptors, it will be treated as an integer, but the symbolicity of char is determined by the compiler implementation, well, the problem arises. Standard ASCII is all positive, but when encountering unicode encoding, it will encounter negative values. In this case, different compilers will produce different outputs. Therefore, the information [19] suggests that char is unified into unsigned char at this time to avoid inconsistency.

Materials [20] introduces many standard library types formatting support, including formatting support for path objects of filesystems. But because strings with spaces will have the problem of quoting characters, that is, double quotes, should I escape? There is also the encoding and formatting problems faced by localization, which were once asked by SG 16 to remove support for path. Data [21] proposed an improved solution to these problems and finally solved the crisis.

References

:///questions/2342162/stdstring-formatting-like-sprintf

[2] P0645R10: Text Formatting

[3] P2372R3: Fixing locale handling in chrono formatters

[4] P2093R14: Formatted output

[5] P2286R8: Formatting Ranges

[6] P2585R1: Improve default container formatting

[7] P2693R1: Formatting thread::id and stacktrace

[8] P2419R2: Clarify handling of encodings in localized formatting of chrono types

[9] P2216R3: std::format improvements

[10] P2508R1: Expose std::basic-format-string<charT, Args…>

[11] P2418R2: Add support for std::generator-like types to std::format

[12] P2572R1: std::format() fill character allowances

[13] P2675R1: format’s width estimation is too approximate and not forward compatible

[14] P2587R3: to_string or not to_string

[15] P2757R3: Type-checking format args

[16] P2510R3: Formatting pointers

[17] P2905R2: Runtime format strings

[18] P2918R2: Runtime format strings II

[19] P2909R4: Fix formatting of code units as integers (Dude, where’s my char?)

[20] P1636R2: Formatters for librarytypes

[21] P2845R8: Formatting of std::filesystem::path

This is the end of this article about format and vformat functions in C++. For more related contents of C++ format and vformat functions, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!