Migrating to a safer API with {fmt} 8.x

We recently migrated a large codebase from {fmt} 7.x to 8.1.1 and in this blog post I’ll show some of the fun issues discovered thanks to improved diagnostics in the new version of this library.

Let’s start with this piece of questionable code:

std::string format_error(
    std::uint_least8_t squishiness) {
  static const std::string format =
    "Invalid squishiness: {}";
  return fmt::format(format, squishiness);
}

It looks relatively innocent but there are multiple problems with it:

  1. Unnecessary std::string construction
  2. Unnecessary synchronization to initialize this std:string object
  3. The format string can’t be statically checked

The generated code is pretty bad too (godbolt):

format_error[abi:cxx11](unsigned char):
  push r12
  mov r12, rdi
  push rbp
  push rbx
  mov ebx, esi
  sub rsp, 16
  movzx eax, BYTE PTR guard variable for format_error[abi:cxx11](unsigned char)::format[rip]
  mov rbp, rsp
  test al, al
  jne .L3
  mov edi, OFFSET FLAT:guard variable for format_error[abi:cxx11](unsigned char)::format
  mov rbp, rsp
  call __cxa_guard_acquire
  test eax, eax
  jne .L13
.L3:
  mov rsi, QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip]
  movzx ebx, bl
  mov r8, rbp
  mov rdi, r12
  mov rdx, QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip+8]
  mov ecx, 2
  mov DWORD PTR [rsp], ebx
  call fmt::v8::vformat[abi:cxx11](fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)
  add rsp, 16
  mov rax, r12
  pop rbx
  pop rbp
  pop r12
  ret
.L13:
  xor edx, edx
  mov rsi, rbp
  mov edi, OFFSET FLAT:format_error[abi:cxx11](unsigned char)::format
  mov QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip], OFFSET FLAT:format_error[abi:cxx11](unsigned char)::format+16
  mov QWORD PTR [rsp], 23
  call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
  mov rdx, QWORD PTR [rsp]
  mov esi, OFFSET FLAT:format_error[abi:cxx11](unsigned char)::format
  mov edi, OFFSET FLAT:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev
  movdqa xmm0, XMMWORD PTR .LC0[rip]
  mov QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip], rax
  mov QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip+16], rdx
  mov edx, 31520
  movups XMMWORD PTR [rax], xmm0
  mov WORD PTR [rax+20], dx
  mov rdx, QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip]
  mov DWORD PTR [rax+16], 980644709
  mov BYTE PTR [rax+22], 125
  mov rax, QWORD PTR [rsp]
  mov QWORD PTR format_error[abi:cxx11](unsigned char)::format[rip+8], rax
  mov BYTE PTR [rdx+rax], 0
  mov edx, OFFSET FLAT:__dso_handle
  call __cxa_atexit
  mov edi, OFFSET FLAT:guard variable for format_error[abi:cxx11](unsigned char)::format
  call __cxa_guard_release
  jmp .L3
  mov rbp, rax
  jmp .L5
format_error[abi:cxx11](unsigned char) [clone .cold]:
.L5:
  mov edi, OFFSET FLAT:guard variable for format_error[abi:cxx11](unsigned char)::format
  call __cxa_guard_abort
  mov rdi, rbp
  call _Unwind_Resume
.LC0:
  .quad 2334106421097295433
  .quad 7956005061626524019

As of fmt 8.x and C++20 this example no longer compiles because format strings must be known at compile time by default. And the fix is super easy: just move the format string to the fmt::format where it belongs or make it a constexpr C string or a string_view:

std::string format_error(std::uint_least8_t squishiness) {
  return fmt::format("Invalid squishiness: {}", squishiness);
}

This is not only cleaner but also safer and faster. The generated code becomes much simpler too (codebolt):

.LC0:
  .string "Invalid squishiness: {}"
format_error[abi:cxx11](unsigned char):
  push r12
  movzx esi, sil
  mov ecx, 2
  mov edx, 23
  mov r12, rdi
  sub rsp, 16
  mov DWORD PTR [rsp], esi
  mov r8, rsp
  mov esi, OFFSET FLAT:.LC0
  call fmt::v8::vformat[abi:cxx11](fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)
  add rsp, 16
  mov rax, r12
  pop r12
  ret

Another case that we discovered involves choosing between multiple static format strings at runtime:

std::string quote(std::string_view s, bool single) {
  return fmt::format(single ? "'{}'" : "\"{}\"", s);
}

Again, this no longer compiles because the format string is not known at compile time. However, you can easily refactor the code to fix the issue:

std::string quote(std::string_view s, bool single) {
  return single ? fmt::format("'{}'", s) : fmt::format("\"{}\"", s);
}

Alternatively you can move the part that changes into a formatting argument:

std::string quote(std::string_view s, bool single) {
  return fmt::format("{0}{1}{0}", single ? '\'' : '"', s);
}

In other cases the format string can be fully dynamic. For example, {fmt} is occasionally used as a basic template engine:

std::string tmpl = load_template("/path/to/template");
auto result = fmt::format(tmpl,
                          fmt::arg("first_name", "Ijon"),
                          fmt::arg("last_name", "Tichy"));

Putting aside the fact that {fmt} is not a template engine and you are probably better off using something like mustache instead, you can opt out of compile-time checks by wrapping your format string in fmt::runtime:

std::string tmpl = load_template("/path/to/template");
auto result = fmt::format(fmt::runtime(tmpl),
                          fmt::arg("first_name", "Ijon"),
                          fmt::arg("last_name", "Tichy"));

This is still safe with errors reported at runtime as exceptions.

Now let’s move to actual bugs. One class of bugs that we’ve fixed can be illustrated on this example:

void print_indexed(const std::vector<int>& v) {
  for (size_t i = 0; i < v.size(); ++i)
    fmt::print("{}: {}\n", index, v[i]);
}

Can you spot an error?

It is not very hard to see in this example because the code was intentionally simplified but in real code it may be quite challenging. The problem is that the programmer accidentally typed index instead of i. But what is index? It is, of course, a POSIX function.

{fmt} 8.x forbids formatting of function pointers so this no longer compiles.

Here’s a small variation of our first example that illustrates another class of bugs:

void log_error(std::uint_least8_t squishiness) {
  fmt::format("Invalid squishiness: {}", squishiness);
}

The problem here is that the programmer formatted the error message but forgot to actually write it to the log. {fmt} 8.x warns about this because std::format and other formatting functions are now annotated with [[nodiscard]].

And finally, let’s looks at this example:

void fancy_print(squishiness s) {
  fmt::print("|{:^10}|{:^10}|", "squishiness", s);
}

It looks perfectly fine except that the programmer didn’t implement any format specifier support for squishiness and you cannot see it from the call site. As a result this throws format_error when run. Fortunately, with {fmt} 8.x and C++20 this is detected at compile time too (godbolt). To fix this you can reuse one of the standard formatters that support width and alignment (godbolt):

template <>
struct fmt::formatter<squishiness> : formatter<std::string_view> {
  auto format(squishiness s, format_context& ctx) const {
    return formatter<std::string_view>::format("very squishy", ctx);
  }
};

Conclusion: Migration to {fmt} 8.x turned out to be highly beneficial, eliminating several classes of bugs. This was not only due to compile-time format string checks which are now enabled by default but also stricter pointer diagnostic and [[nodiscard]].