String View in C++17 (std::string_view)

The C++ string (std::string) is like a thin wrapper which stores its data on the heap. When you deal with strings it happens very often that a string copy must be created and therefore a memory allocation is done. But there are many use cases which do not require a copy of the instance. In these situations, you want to analyze the string or do some calculations based on the string but you don’t want to change its content.

With C++17 the string view was introduced (std::string_view). The string view is designed for use cases were a non-owning reference to a string is needed. It represents a view of a sequence of characters. This sequence of characters can be a C++-string or C-string. A string view offers nearly the same methods as a standard string. The following example shows how to create and use a string view.

int main()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	std::string_view substr = strview.substr(strview.find('B'));

	std::cout << substr << '\n';
}

A typical implementation of a string view holds two members only: a pointer to constant char array and a size. Therefore, it is quite cheap to copy a string view. The main purpose of a string view is to avoid copying data if only a non-mutating view is required. So, the main reason for string view is performance.

The substr method is a good example to show the difference between string and string view performance. Both offer the substr method to get a substring. But there is a large difference in performance. The substr method of the string has a linear complexity and therefore it directly depends on the size of the content. The string view method has a constant complexity and therefore it is independent of the content size. So, if you have to deal with large strings and substring you may get a huge performance gain by using string view.

But if we speak about performance, we should never make a general statement like: “string view is always more performant than string”. Compilers are very smart when handling strings, especially when strings are short. Such a “small string optimization” is done for short strings. For example, in MSVC and GCC strings with a size up to 15 characters are stored on the stack and not on the heap. Because of the compiler optimizations, in some use cases there might no advantage if you use string view instead of string.

Non-owning reference

The string view holds a non-owning reference to a character array. Therefore, the lifetime of the referenced object must be larger than the lifetime of the string view. Otherwise it results in undefined behavior. The following code shows an erroneous implementation. The returned string view references to the string which is valid in method scope only. So, the use of the string view within the main-method scope will result in undefined behavior.

std::string_view Create()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	return strview;
}

int main()
{
	std::string_view strview = Create();

	std::cout << strview << '\n';
}

Not null-terminated

There is one major pitfall if you use a string view: The string view content may not be null-terminated. This becomes relevant if you want to use functions like “atoi” or “printf” which expects a null-terminated c-string. You can easily pass the content of the string view by its “data()” method. It returns a pointer to the underlying character array. But this character array may not be null-terminated. Unfortunately the string view does not have a “c_str()” method like the string class. And to confuse us completely the “data()” method of the string view may have a different behavior than the “data()” method of the string, which returns a null-terminated character array in C++11 and later. In my opinion it isn’t a good choice to have such different behaviors in string and string view as they have nearly equal interfaces and as result users expect an equal behavior.

If you have to get the underlying character array of a string view as a null-terminated string, you have to explicitly convert it to a string and so you are able to use the “c_str()” method. The following example shows an according use case. The first string view contains a null-terminated character array. So, we can pass it to the “strlen” method. The second string view is not null-terminated. So, the “strlen” method call will result in undefined behavior. The third method call shows the conversion to a null-terminated string.

int main()
{
	// null terminated string
	std::string str = "FooBar";
	std::string_view strview(str);

	std::cout << std::strlen(strview.data()) << '\n';

	// not null terminated string
	char str2[6] = { 'F','o','o','B','a','r' };
	std::string_view strview2(str2, sizeof str2);

	std::cout << std::strlen(strview2.data()) << '\n';

	// convert to null terminated string
	std::cout << std::strlen(std::string(strview2).c_str()) << '\n';
}

Use string view as method parameter

After this short excursion to the risks when using string view, we will come back to its strengths. As we learned so far, the string view may be favorable whenever we must create a constant copy of the string value content. So, it may be a perfect choice for a read-only function parameter. Following we will compare three variants of a method implementation. Within these variants the string is passed as c-string, c++ string or string view.

Let’s start with a c-string. To keep it simple the method contains the extraction of a substring only, as example to access the input parameter.

void AnalyzeCharArray(const char* s)
{
	auto x = strchr(s, 'B');
}

int main()
{
	char arr[6] = { 'F','o','o','B','a','r' };
	std::string str = "FooBar";
	std::string_view strview(str);

	// with c string
	AnalyzeCharArray(arr);
	AnalyzeCharArray(str.c_str());
	AnalyzeCharArray(std::string(strview).c_str());
}

This implementation has some disadvantages. We must use “c_str()” if the method is called with std::string. We must do a safely null-terminated string construction if the method is called with string view.

So, let’s change the method and use a C++ string.

void AnalyzeStdString(const std::string& s)
{ 
	auto x = s.substr(s.find('B')); 
}

int main()
{
	char arr[6] = { 'F','o','o','B','a','r' };
	std::string str = "FooBar";
	std::string_view strview(str);

	// with c++ string
	AnalyzeStdString(arr);
	AnalyzeStdString(str);
	AnalyzeStdString(std::string(strview));
}

That looks better. But there are still some downsides. If the method is called with a string view, a conversion to string is necessary. Furthermore, there are some memory allocations: if the method is called with an c-string, the conversion to a c++ string and the substr call.

As third alternative we use the string view as method parameter.


void AnalyzeStringView(const std::string_view s)
{
	auto x = s.substr(s.find('B'));
}

int main()
{

	// with string view
	AnalyzeStringView(arr);
	AnalyzeStringView(str);
	AnalyzeStringView(strview);
}

This seems to be a good choice. The method can be called directly with char array, string and string view and there is no additional memory allocation. Furthermore, the string view offers same member functions like a string and can be use with algorithms.

So, the use of a string view as method parameter is the best choice in such use cases.

Use string view as return value

Of course, the string view can be used as return value too. As shown previously you just must be careful with object lifetime of the referenced object. The following source code shows an according example.

std::string_view GetSubstring(const std::string_view s)
{
	return s.substr(s.find('B'));
}

int main()
{
	std::string str = "FooBar";
	std::string_view strview(str);

	std::cout << GetSubstring(strview) << '\n';
}

Summary

The string view represents a view to a sequence of characters. As the string view is a lightweight object stored on the stack it solves some performance issues of a standard string. Both objects have a nearly similar interface so you can easily exchange them. There are two major pitfalls if you use a string view. At first, you must think about the lifetime of the referenced object instance. As the string view holds a non-owning reference, the lifetime of the referenced object must be larger than the lifetime of the string view. At second, the string view may contain a string which is not null-terminated. So, you may have to convert it to a c-string if you want to pass it to a function which expect a null-terminated character array.

Dieser Beitrag wurde unter C++ veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden /  Ändern )

Google Foto

Du kommentierst mit Deinem Google-Konto. Abmelden /  Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s