std::vector vs c-style array

Within this article I want to compare the base array type and the vector offered by the std library. I want to call these two types “c-style array” and “std::vector”. The comparison will show you the pros and cons of the different types and give you a guideline when to use what.

The STL offers another base container, the std::array. I don’t want to pay attention to this container within this comparison for two reasons. At first, it is very easy to decide whether you want to use a std::vector or a std::array. You should use std::array when the container size is known at compile time. If this size isn’t known at compile time or you container may grow or shrink, you should use std::vector. At second, a comparison between std::array and c-style array is important, but this is an own topic and I don’t want to mix different topics within one article. In a previous article you can find an according comparison

 

Use cases

The c-style array is by default a fixed size container. Therefore you may ask whether it makes sense to compare it with a std::vector. Within the following table I will show the base use cases for containers and the according STL type or c-style type which may be used for implementation.

Use Case STL Container c-style container
The size is fixed. std::array array on stack

(e.g. “int someArray[5]”)

The container will be created by actual data. Therefore, initialization has variable size. After this creation, the data will be used only and therefore the size becomes fixed. std::vector array on heap

(e.g: int *someArray = new int[5];”)

The size is variable and can be grow or shrink during runtime. std::vector array on heap

(and maybe some management data)

 

As you can see, the std::vector is used for all cases with a variable size container. As the STL does not contain a specialized container for the second use case we only have to distinguish between variable and fixed size containers.

The c-style array can be used to implement all three use cases. But now we can distinguish between the second and the third use case. To implement the size changes during runtime we may need an additional management component for the array grow and shrink functionality.

 

Base concept of std::vector

A vector is the base container. Bjarne Stroustrup suggest: “By default, use Vector when you need a container”. Therefore, if you don’t have any good reasons to use another container type, you should use a vector.

A vector is a dynamic array that goes on the heap. Its size does not need to be known at compile time. If you add or remove items, the underlying array changes size.

You can think of a std::vector as a wrapper to „int *someArray = new int[5]“ where a std::array is a wrapper to „int someArray[5]“.

 

Initialize, change, read

The following example application shows the initialization of containers with three elements. The vector size isn’t set explicitly. If you add an element to a std::vector it will reserve additional memory of necessary.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = {1, 2, 3};

  values1[1] = 15;

  std::cout << "size: " << (sizeof(values1) / sizeof(*values1)) << std::endl;

  for (auto element : values1)
  {
    std::cout << element << std::endl;
  }
  
  // std vector
  std::vector values2 {4, 5, 6};
  
  values2[1] = 25;

  std::cout << "size: " << values2.size() << std::endl;

  for (auto element : values2)
  {
    std::cout << element << std::endl;
  }
}

 

Within this first example you can see that the syntax for the variable declaration is slightly different, but element access and usage within a range based for loop is the same for both types. Furthermore, the example shows a first big difference of both containers: the std::vector knows its size. For the c-style array you have to create an additional variable to store the array size or, as shown in the example, you can calculate the size by dividing the memory reserved for the array by the memory needed to store a single element.

 

Use as method parameter

In a real application, you will split up your program logic into functions and therefore a standard use case is to pass parameters to functions. So, let’s change the example a little bit and create functions which should print the containers.

void PrintArray(int* values, int size)
{
  std::cout << "size: " << size << std::endl;

  for (int i = 0; i < size; ++i)
  {
    std::cout << values[i] << std::endl;
  }
}

void PrintVector(const std::vector &values)
{
  std::cout << "size: " << values.size() << std::endl;

  for (auto value : values)
  {
    std::cout << value << std::endl;
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = { 1, 2, 3 };

  PrintArray(values1, 3);

  // std vector
  std::vector values2{ 4, 5, 6 };

  PrintVector(values2);
}

 

This simple use case shows a very important difference between a c-style array and a std::vector. The c-style array will be passed as pointer to its first parameter. The std::vector instead offers the standard copy semantic. Therefore, you can use standard coding practices and for example pass the parameter by const reference. So, the std::vector doesn’t convert to a pointer unless you explicitly ask it to.

A c-style array is passed as pointer to the first element. This principle is called “pointer decay”. As the array decays to a pointer, the size information is loss. Therefore, we have to pass an additional function parameter which contains the array size. The previously shown calculation depending on the memory size will no longer work, as the memory size of the passed parameter is the size of a pointer now. Furthermore, as the size information is lost, the range based for loop cannot be used anymore.

You can see, this little design change of the application, has nearly no impact if you use a std::vector. You can simple refactor your code and extract the according code section as method. But if you try to do the same for a c-style array, you will get in trouble and as result of the pointer decay you have to change your implementation fundamentally. Within the following article you can see some other impacts of the pointer decay, especially in base and derived class relationships: Arrays and inheritance, a source of errors.

 

Use as return value

C-style arrays are among the few types which cannot be used as L-value. So, c-style arrays cannot be used on the left side of an assignment and therefore they cannot be used as return values too. But you can return a pointer to a c-style array. Unfortunately, this will result in with two big issues: you have to know the size of the created array outside of the method and you have to implement the resource management based on this wild pointer. Both issues are a good source for errors.

A std::vector supports the c++ copy semantic. Therefore, it can be used as return value. The following example shows possible implementation for both kinds of arrays.

int* CreateArray()
{
  int* values = new int[3] { 1, 2, 3 };
  return values;
}

std::vector CreateVector()
{
  std::vector values = { 4, 5, 6 };
  return values;
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int* array1 = CreateArray();
  int size = 3;   // you have to know the size of the array
  
  for (int i = 0; i < size; ++i)
  {
    std::cout << array1[i] << std::endl;
  }
  
  delete[] array1; // you have to delete the array
  
  // std vector
  std::vector values2 = CreateVector();
  
  for (auto element : values2)
  {
    std::cout << element << std::endl;
  }
}

 

Range check

One of the base concepts of c and c++ is to provide build in types without any overhead which could result in performance loss or higher memory usage. Therefore, such types do not contain a management layer with range checks and similar features. Of course, this missing secure access layer could result in implementation errors which cause undefined behavior of your application.

The std::vector expands this concept and provides a combination of the advantages of both ideas. In favor of performance it comes without any management layer in release build and on the other hand it offers a secure access layer in debug builds. If you create a debug build, the std::vector contains range checks. This will help to find implementation errors. The following source code shows an according example. If you execute this example as debug build, the erroneous std::vector access will throw an out of range exception.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = { 1, 2, 3 };

  values1[5] = 15;   // undefined behavior
   
  // std vector
  std::vector values2 {4, 5, 6};
   
  values2[5] = 25;  // out of range exception in debug build  
}

 

Standard algorithms

The std::vector provides an STL-consistent interface and can be used with the STL algorithms. The STL algorithms also provide support for c-style arrays. If you have c-style arrays with known size, you can directly use them together with the algorithms. As shown in the previous examples, pointer decay will result in a loss of the size information and therefore in a loss of direct support of algorithms. But you can use a template based approach to keep the size information of a c-style method parameter.

The following example shows how to use both container types in STL algorithms. In this case the sort algorithm is used. Furthermore, it shows an implementation to pass a c-style array as a reference to template function, which keeps the size information and allows the use of STL algorithms.

// template based function which keeps size information 
// of c-style array and therefore allows to use STL algorithms
template 
void SortArray(T(&values)[Size])
{
  std::sort(std::begin(values), std::end(values));  
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = { 7, 5, 3 };
  
  std::sort(std::begin(values1), std::end(values1));

  for (auto element : values1)
  {
    std::cout << element << std::endl;
  }
  
  // std vector
  std::vector values2 = { 7, 5, 3 };

  std::sort(values2.begin(), values2.end());
  
  for (auto element : values2)
  {
    std::cout << element << std::endl;
  }
}

 

Memory usage

A std::vector is a wrapper for a c-style array with additional management data needed for the grow and shrink functionality. This management data needs a few bytes extra space. Therefore, compared with a c-style array, the std::vector needs more memory. In most cases this extra memory can be ignored especially if you use large containers. Moreover, if you want to implement a c-style array which can grow and shrink you have to store additional management data too.

In my opinion, there is one use case where this higher memory usage will be crucial: You need a lot of small containers with fixed size during lifetime but variable size on initialization. In this case the grow and shrink functionality of the std::vector with its according management data is unnecessary overhead as it is not needed during lifetime of the container. It is only needed because the initialization size of the container is not known at compile time. If you have to create many of such containers and if they contain a small amount of data, a c-style array may be a good choice. But even in this case I would recommend the use of a c-style array only if you have a high memory workload which results in real issues. If you don’t have measurable issues you should use a std::vector anyway. The higher memory need is a small price which you should pay to get all the nice advantages of a std::vector which will help you to create more robust applications.

 

Performance

The C++ core guidelines suggest to use a std::vector instead of a c-style array (see “SL.10: Prefer using STL array or vector instead of a C array”). That’s because the std::vector offers a lot of advantages without a performance loss. Therefore, it does have the same performance as a raw c-style array.

But unfortunately, there is a persistent myth that for run-time speed, one should use c-style arrays. One reason for this myth are examples which compare c-style array and std::vector speed. Such examples will sometimes show a huge performance difference and they explain this difference as result of the std::vector management functionality for the size flexibility. These examples have one common problem: they use the std::vector in a wrong way. This poor implementation causes the performance loss. Following we will see some typical examples.

A second reason for this myth to persist is when comparing c-style arrays and std::vector in debug mode. As shown previously, the std::vector does range checks during debug mode. Of course, this will reduce the performance. The c-style array performance instead is indifferent between debug and release mode. Therefore, performance comparisons in debug mode are not meaningful.

As mentioned before, we will now look at some typical implementation issues which reduce the performance of a std::vector. After I have seen some of the examples with supposedly bad performance of the std::vector, I found two main issues: slow write and slow read commands. The following source code shows a typical example for a slow write and a slow read implementation and a right way to implement it.

If you create the std::vector and fill it with data, you should set an expected size. Normally you know the approximately or average size for your container. If you set this size during creation you avoid unnecessary grow operations. The second part of the example shows two ways to access the std::vector elements: you can use the “at()” method or you can use the “[]” operator. But these are not two different spellings for the same thing. There is a different functionality too. The “at()” method will execute an additional range check. And this range check is done in release build too. Therefore, it will add functionality for a secure element access but of course, this extra functionality comes with performance costs.

int _tmain(int argc, _TCHAR* argv[])
{
  // reserve memory before adding elements
  // e.g. you know the vector will have 100 up to 150 elements, 
  // then already reserve space for 100 elements
  std::vector values;
  values.reserve(100);
   
  int x = 125;  // example only, will be a variable size of 100 to 150
  for (int i = 0; i < x; i++)
  {
    values.push_back(i);
  }

  // dont use at() except you explicitly want a range check
  int element1 = values.at(17);   // range check on debug and release build
  int element2 = values[17];      // no range check
}

 

By avoiding these two implementation errors, a lot of the examples showing performance issues can be proved wrong. Let me mention two more performance thoughts before we finish this topic. Another important fact is that you should try to avoid inserting an element in front of the std::vector. For an insert at the front, a copy of the whole vector is necessary. If you cannot avoid such inserts you may use some other container. But do performance measurements in this case. Other containers lie a list will support an insert at the front but they come with a different memory management which results in performance loss in other situations. Therefore, even for a use case with (sporadic) insert at front, the std::vector may be the best choice.

A further performance improvement can be reached by using “emplace_back()” instead of “push_back()” while inserting into a std::vector. To keep it simple you can say: emplace back moves an object into a container where push back copies it. If the object supports this movement you will get a performance gain. You don’t have to worry about whether the object supports movement or not. If it isn’t supported the function will copy it anyway. So, emplace back can be used in any case and it will be as fast as or faster as push back, depending on the object you want to add to the vector.

 

Summary

If you need a container with variable size, your choice will be easy for normal use case. Let me repeat Bjarne Stroustrup’s suggestion: “By default, use Vector when you need a container”. I would recommend the same and suggest using another container type only if it adds some benefit in your special use case. And please keep in mind: This benefit must be measurable, especially if you expect a performance gain. Unfortunately, often the choice for a container type is based on belief and not on measurable facts. Finally I will finish the article with a short summary of the pros and cons of a std::vector compared to a c-style array.

Advantages of std::vector

  • Perfectly matches with the strict type system concept of c++
  • Copy semantic / no pointer decay / the array doesn’t convert to a pointer unless you explicitly ask it to
  • It knows its size
  • Range check in debug builds
  • No memory overhead beyond what it needs to hold its elements
  • Same performance as a c-style array
  • Direct support of use in STL algorithms

Disadvantages of std::vector

  • Slightly higher memory need as additional management data must be stored. This disadvantage is only true in case you have the second of the three main use cases shown at the beginning, with a variable size initialization but a fixed size during use of the container.
Advertisements
Dieser Beitrag wurde unter C++ veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden /  Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden /  Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )

w

Verbinde mit %s