The right way to Measure String SSO Size with constinit and constexpr

[ad_1]

On this textual content you’ll study a couple of methods and experiments with constexpr and constinit key phrases. By exploring the string implementation, you’ll additionally see why constinit is so highly effective.

What’s SSO

 

Simply briefly, SSO stands for Brief String Optimization. It’s often carried out as a small buffer (an array or one thing related) occurring in the identical storage because the string object. When the string is brief, this buffer is used as a substitute of a separate dynamic reminiscence allocation.

See a simplified diagram beneath:

SSO idea

The diagram illustrates two strings and the place they “land” within the string object. If the string is lengthy (longer than N characters), it wants a pricey dynamic reminiscence allocation, and the handle to that new buffer might be saved in ptr. However, if the string is brief, we will put it inside the article within the buf[N]. Normally, buf and ptr may be carried out as union to avoid wasting house, as we use one or the opposite, however not each concurrently.

Let’s begin with a primary check and see what’s the stack measurement of std::string utilizing sizeof():

int major() {
    return sizeof(std::string);
}

Run at Compiler Explorer

GCC and MSVC present 32, whereas the libc++ implementation for Clang returns 24!

And now, it’s time to verify the size of that quick string; how can we verify it? We’ve got a number of choices:

  • at runtime
  • constexpr since C++20
  • constinit since C++20
  • simply checking for std::string{}.capability();
  • and we will all the time look into actual implementation and verify the code 🙂

Let’s begin with the primary apparent choice:

Checking size through the use of capability()

 

As identified in feedback at reddit (thanks VinnieFalco) – you may verify the dimensions of the SSO through capability() of the empty string:

#embrace <string>

int major() {
    constexpr auto ssoLen = std::string{}.capability();
    static_assert(ssoLen >= 15);
    return static_cast<int>(ssoLen);
}

Run @Compiler Explorer

  • GCC and MSVC reveals Program returned: 15
  • Clang prints Program returned: 23

Let’s take a look at another experiments.

Checking size at runtime

 

To verify the size of the small buffer, we will write a new() handler and easily watch when new is used when making a string object:

#embrace <string>
#embrace <iostream>

void* operator new(std::size_t measurement) {
	auto ptr = malloc(measurement);
	if (!ptr)
		throw std::bad_alloc{};
	std::cout << "new: " << measurement << ", ptr: " << ptr << 'n';
	return ptr;
}

// operator delete...

int major() {
    std::string x { "123456789012345"}; // 15 characters + null
    std::cout << x << 'n';
}

Right here’s the code @Compiler Explorer

Once you run the applying, you’ll see that solely the string is printed to the output.

However if you happen to change the string to:

  std::string x { "1234567890123456"}; // 16 characters + null

GCC stories:

new: 17, ptr: 0x8b82b0
1234567890123456

Equally, MSVC (operating native MSVC launch, because it doesn’t work beneath Compiler Explorer)

new: 32, ptr: 000001CD37720B00
1234567890123456

Clang continues to be “silent,”… however let’s change the string to:

std::string x { "12345678901234567890123"}; // 23 characters + null

Now, the libc++ implementation requests some dynamic reminiscence. (Right here’s an excellent overview of the way it’s achieved: libc++’s implementation of std::string | Joel Laity)

In abstract

  • GCC and MSVC can maintain 15 characters (assuming char sort, not wchar_t),
  • The Clang implementation (-stdlib=libc++) can retailer 23 characters! It’s very spectacular, as the dimensions of the entire string is barely 24 bytes!

That was a easy and “basic” experiment… however in C++20, we will additionally verify it at compile time!

constexpr strings

 

Let’s begin with constexpr. In C++20, strings and in addition vectors are constexpr prepared.

What’s extra, we’ve even constexpr dynamic reminiscence allocations in C++20.

The dynamic allocation at compile time can happen solely within the context of a operate execution, and the allotted reminiscence buffer can’t “transfer” to the runtime. In different phrases, it’s not “transitive”. I wrote about it in a separate weblog put up: constexpr Dynamic Reminiscence Allocation, C++20 – C++ Tales

Briefly, we will strive the next code:

#embrace <string>
#embrace <iostream>

constexpr std::string str15 {"123456789012345"};
//constexpr std::string str16 {"1234567890123456"}; // would not compile

int major() {
    std::cout << str15 << 'n';
}

Run at Compiler Explorer

The above code creates a string utilizing constexpr with 15 characters, and because it suits into an SSO buffer, it doesn’t violate any constexpr necessities. However, str16 would wish a dynamic reminiscence allocation, and thus the compiler stories:

/decide/compiler-explorer/gcc-trunk-20221121/embrace/c++/13.0.0/bits/allocator.h:195:52: error: 'std::__cxx11::basic_string<char>(((const char*)"1234567890123456"), std::allocator<char>())' will not be a relentless expression as a result of it refers to a results of 'operator new'
  195 |             return static_cast<_Tp*>(::operator new(__n));
      |                                      ~~~~~~~~~~~~~~^~~~~

At present (Nov 2022), the libc++ implementation doesn’t appear to compile, so it may need some C++20 points.

Nevertheless it’s not all in C++20, as we will do extra:

fixed initialization

 

In C++20, we even have a brand new key phrase, constinit – it forces fixed initialization of non-local objects. Briefly, our object might be initialized at compile time, however we will later change it like a daily international variable.

We will rewrite our earlier instance to:

#embrace <string>
#embrace <iostream>

constinit std::string international {"123456789012345"};

int major() {
    std::cout << international << 'n';
    // however permit to alter later...
    international = "abc";
    std::cout << international;
}

See @Compiler Explorer

In the event you lengthen the string and add another letter:

constinit std::string international {"1234567890123456"};

You’ll get the next error:

error: 'constinit' variable 'international' doesn't have a relentless initializer

Abstract

 

It was a enjoyable experiment! In C++20, you may depend on fixed initialization and constexpr strings to verify SSO size.

I’m not advocating utilizing international objects, however if you happen to want them, then constexpr may be good. As you may see, when you have quick strings, then they are often safely initialized at compile time.

As identified by 2kaud in feedback, to retailer constexpr string literals you may also leverage string_view that may maintain any size of a string literal:

constexpr std::string_view resName { "an important useful resource lengthy title..." };

As a aspect be aware:

The opposite title for this type of optimization is SBO – Small Buffer Optimization. This may be utilized not solely to strings however, for instance, to things like std::any and even containers (std::vector by design doesn’t provide this optimization, however we will think about an identical non-standard container with a small buffer).

References

 

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *