lambdas and scope variables

A lambda expression can access local variables of the scope in which it is used. You can pass these variables by value or by reference to the lambda function. The following example shows how to sort a vector of signed integers. They should be sorted by the absolute values and in descending order. Therefore, an according lambda function was created. Additional, we want to know the number of function calls performed by the sort algorithm So we add a variable in local scope and pass a reference to the variable to the lambda expression.

int _tmain(int argc, _TCHAR* argv[])
{
  auto data = std::vector  {-5, 1, 15};

  int numberOfFunctionCalls = 0;

  auto AbsoluteDescendingComparer = [&numberOfFunctionCalls](int a, int b)
  {
    ++numberOfFunctionCalls;

    return abs(a) > abs(b);
  };

  std::sort(data.begin(), data.end(), AbsoluteDescendingComparer);

  return 0;
}

 

The first pair of brackets “[…]” is used to pass local variables of the lambda’s scope to the embedded lambda.  The following syntax can be used

  • “[&]” to pass all variables by references
  • “[=]” to pass all variables by value
  • “[&x]” or “[=x]” to pass the variable “x” by reference or by value

 

Evaluation in context of procedural and object-oriented programming

From a technical point of view, it is very easy to use local scope variables in lambdas. But if we think about good software design we may see some issues. My first concern is: we create hidden function parameters which will make it more difficult to see the dependencies between the different parts of the application. Why do we have this feature of passing variables in the “[…]” part at all? We can pass all parameters as function parameters anyway. So why don’t we use the normal way we know from functions? In my opinion, it is not necessary to have two ways to pass variables into a lambda function.

But what if we don’t want to use this feature? How can we implement a use case like shown in the example above? This would not be possible anymore. And I think that’s fine! From a software design point of view, I would say following: a lambda function is “a” function and therefore it should do one thing only – like any other function. In most (or all?) cases we use the “[…]” feature we smuggle in some values into the function because we need these values for a second functionality which will be executed by our lambda. So, our lambda function becomes additional responsibilities which contradicts the software design principle that a function should do one thing only. If we have such use cases like above, we should think about suitable components, like functors or classes. A functor or class can offer a second function which implements our second use case. In this case it can offer a getter method which tells about the number of function calls and a function to reset this counter.

In general, I can see two main kinds of use cases for the “[…]” feature. If we pass variables by value, we have a factory use case. That’s means we need these values to initialize or set up the behavior of our lambda. In this case I would prefer a functor with an according constructor which contains these parameters.

If we pass the values by reference, we want to get additional results at the end or during the execution of the function. In these cases, we add a second responsibility to the function. Therefore, a class would be a better choice because it can offer as many function as needed for this use case. Furthermore, it can hide implementation details, e.g. like in the case above we have to introduce a counter variable which will be increased at any function call. This is an implementation detail which should be hidden by a class and not seen by the client.

Furthermore, functions with references to scope variables are functions with side effects. And side effects create spaghetti code. I know, the term “spaghetti code” is original used for a confused program flow mainly caused by jump conditions, but I also use this term for a confused data flow. I think, in modern software, a messy data flow caused by functions with side effects, is the main reason for erroneous code and for legacy code which is nearly not maintainable.

 

Evaluation in context of functional programming

Till now I expressed a lot of negative thoughts about scope variables. But of course, this feature does not exist without a reason. If we use it in a suitable way it can offer some advantages. These advantages come to light if we implement in a functional style. Within the example above, the lambda was embedded and executed directly within the parent function. This procedural way of implementation is normally done by function calls with function input parameter. But the conditions change if we don’t execute the lambda directly. Instead, our parent function should now return a pointer to the lambda function. So, the client will be able to store this function pointer and execute the function call later. This functional way of implementation can benefit from the possibility to use the parent scope where a function is embedded. This feature is called “closure”. In short it means, that a function can exists including its surrounding context. Therefore, in this functional programming context, the variables are no longer “passed” to the function, they are “captured”.

At this point I want to stop and don’t want to go into details about closures and functional programming with C++, as this is an own topic and will go beyond the scope of this article. But there will be a further article with the topic “closures in C++”.

Of course, I mentioned this topic in this short section to show the real use case for scoped variable access in lambda functions. But in most cases, we use C++ to implement in a procedural or object-oriented way. And within this programming paradigms, the scoped variable access most often results in more disadvantages then advantages.

 

Summary

Lambda functions offer a mechanism to pass variables of the local scope into the lambda function. From a technical point of view this feature is easy to use and allows implementation of lambdas in much more use cases. From a software design point of view, one should ask if all these use cases should be implemented by using lambdas at all or if there are much better alternatives. Therefore, some see this feature as a “must have” while others see it as a way to write obscure and error-prone code. And in my opinion, both are right.

Advertisements
Veröffentlicht unter C++ | Kommentar hinterlassen

functor vs. named lambda vs. anonymous lambda

A lambda expression is a mechanism for specifying a function. The primary use for a lambda is to specify a simple action. A lambda expression can be implemented directly as anonymous function, for example within an algorithm method, or it can be named and used by its name.

Of course, there are already some other possibilities to implement functions. Therefore, we may think about the question when we should use lambdas and when we should not use them. Furthermore, we have to decide between named and anonymous lambdas. Following I will give you some guidelines which will help do find a decision.

This comparison is based on a simple use case: we want to sort a list of signed integers by looking at the absolute values and we want to sort in descending order. There exists an STL algorithm for sorting. So, we use this algorithm and provide a comparison function for two elements. At first, we implement this function by using a functor.

struct AbsoluteDescendingComparer
{
  bool operator()(int a, int b)
  {
    return abs(a) > abs(b);
  }
};

int _tmain(int argc, _TCHAR* argv[])
{
  auto data = std::vector {-5, 1, 15};
    
  std::sort(data.begin(), data.end(), AbsoluteDescendingComparer());
    
	return 0;
}

 

At next we implement the same function by using an anonymous lambda. This will give us the possibility to write the comparison directly within the sort algorithm and therefore we can create very short and compact source code.

int _tmain(int argc, _TCHAR* argv[])
{
  auto data = std::vector  {-5, 1, 15};

  std::sort(data.begin(), data.end(), [](int a, int b) { return abs(a) > abs(b); } );

  return 0;
}

 

Another possible implementation will be a named lambda. We can create this lambda first and use it later within the algorithm.

int _tmain(int argc, _TCHAR* argv[])
{
  auto data = std::vector  {-5, 1, 15};

  auto AbsoluteDescendingComparer = [](int a, int b)
  {
    return abs(a) > abs(b);
  };

  std::sort(data.begin(), data.end(), AbsoluteDescendingComparer);

  return 0;
}

 

Comparison of the implementations

If we now ask our self which of the three alternatives we prefer, I’m sure, most will say: the anonymous lambda. That’s because it allows to write the code in a very compact way without any overhead. But what if we ask about our preference in case we have to maintain code? In this case the decision may not that easy anymore. The anonymous lambda has one major disadvantage: to leave out the function name will result in a loss of information. The function name, if it is chosen wisely, offers a lot of information about the implemented function. Without this name, we have to understand the implementation itself to know what it is used for. In case we use an anonymous lambda we have to add a comment to explain the implementation.

Another issue which decreases the code readability and makes it more difficult to understand and maintain the code, is based on the mixing of the two concerns. We have the sort-algorithm and we have a comparer. By using a named lambda we have a clear separation of concerns. Instead, if we use an anonymous lambda, these two functionalities will be mixed together.

The separation of concerns will become even more importand in case the function itself becomes more complex. In such a case, an anonymous lambda as well as an named one are not suitable anymore because both are placed in the local scope of the including method. If such a method becomes too complex it should be refactored by extracting code into separate methods. Therefore, complex functionality should not be implemented by using a lambda.

Recommendations

For a simple functionality, needed in local scope only, a lambda can be used. Otherwise use a functor, a class member function or a non-member function. If you use a lambda I would recommend preferring the named lambda. An anonymous one should only be used if the functionality is simple enough to understand it in the same time it needs to understand the meaning of a function name.

 

Veröffentlicht unter C++ | Kommentar hinterlassen

std::vector vs c-style array

Within this article I want to compare the base array type and the vector offered by the std library. I want to call these two types “c-style array” and “std::vector”. The comparison will show you the pros and cons of the different types and give you a guideline when to use what.

The STL offers another base container, the std::array. I don’t want to pay attention to this container within this comparison for two reasons. At first, it is very easy to decide whether you want to use a std::vector or a std::array. You should use std::array when the container size is known at compile time. If this size isn’t known at compile time or you container may grow or shrink, you should use std::vector. At second, a comparison between std::array and c-style array is important, but this is an own topic and I don’t want to mix different topics within one article. In a previous article you can find an according comparison

 

Use cases

The c-style array is by default a fixed size container. Therefore you may ask whether it makes sense to compare it with a std::vector. Within the following table I will show the base use cases for containers and the according STL type or c-style type which may be used for implementation.

Use Case STL Container c-style container
The size is fixed. std::array array on stack

(e.g. “int someArray[5]”)

The container will be created by actual data. Therefore, initialization has variable size. After this creation, the data will be used only and therefore the size becomes fixed. std::vector array on heap

(e.g: int *someArray = new int[5];”)

The size is variable and can be grow or shrink during runtime. std::vector array on heap

(and maybe some management data)

 

As you can see, the std::vector is used for all cases with a variable size container. As the STL does not contain a specialized container for the second use case we only have to distinguish between variable and fixed size containers.

The c-style array can be used to implement all three use cases. But now we can distinguish between the second and the third use case. To implement the size changes during runtime we may need an additional management component for the array grow and shrink functionality.

 

Base concept of std::vector

A vector is the base container. Bjarne Stroustrup suggest: “By default, use Vector when you need a container”. Therefore, if you don’t have any good reasons to use another container type, you should use a vector.

A vector is a dynamic array that goes on the heap. Its size does not need to be known at compile time. If you add or remove items, the underlying array changes size.

You can think of a std::vector as a wrapper to „int *someArray = new int[5]“ where a std::array is a wrapper to „int someArray[5]“.

 

Initialize, change, read

The following example application shows the initialization of containers with three elements. The vector size isn’t set explicitly. If you add an element to a std::vector it will reserve additional memory of necessary.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = {1, 2, 3};

  values1[1] = 15;

  std::cout << "size: " << (sizeof(values1) / sizeof(*values1)) << std::endl;

  for (auto element : values1)
  {
    std::cout << element << std::endl;
  }
  
  // std vector
  std::vector values2 {4, 5, 6};
  
  values2[1] = 25;

  std::cout << "size: " << values2.size() << std::endl;

  for (auto element : values2)
  {
    std::cout << element << std::endl;
  }
}

 

Within this first example you can see that the syntax for the variable declaration is slightly different, but element access and usage within a range based for loop is the same for both types. Furthermore, the example shows a first big difference of both containers: the std::vector knows its size. For the c-style array you have to create an additional variable to store the array size or, as shown in the example, you can calculate the size by dividing the memory reserved for the array by the memory needed to store a single element.

 

Use as method parameter

In a real application, you will split up your program logic into functions and therefore a standard use case is to pass parameters to functions. So, let’s change the example a little bit and create functions which should print the containers.

void PrintArray(int* values, int size)
{
  std::cout << "size: " << size << std::endl;

  for (int i = 0; i < size; ++i)
  {
    std::cout << values[i] << std::endl;
  }
}

void PrintVector(const std::vector &values)
{
  std::cout << "size: " << values.size() << std::endl;

  for (auto value : values)
  {
    std::cout << value << std::endl;
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = { 1, 2, 3 };

  PrintArray(values1, 3);

  // std vector
  std::vector values2{ 4, 5, 6 };

  PrintVector(values2);
}

 

This simple use case shows a very important difference between a c-style array and a std::vector. The c-style array will be passed as pointer to its first parameter. The std::vector instead offers the standard copy semantic. Therefore, you can use standard coding practices and for example pass the parameter by const reference. So, the std::vector doesn’t convert to a pointer unless you explicitly ask it to.

A c-style array is passed as pointer to the first element. This principle is called “pointer decay”. As the array decays to a pointer, the size information is loss. Therefore, we have to pass an additional function parameter which contains the array size. The previously shown calculation depending on the memory size will no longer work, as the memory size of the passed parameter is the size of a pointer now. Furthermore, as the size information is lost, the range based for loop cannot be used anymore.

You can see, this little design change of the application, has nearly no impact if you use a std::vector. You can simple refactor your code and extract the according code section as method. But if you try to do the same for a c-style array, you will get in trouble and as result of the pointer decay you have to change your implementation fundamentally. Within the following article you can see some other impacts of the pointer decay, especially in base and derived class relationships: Arrays and inheritance, a source of errors.

 

Use as return value

C-style arrays are among the few types which cannot be used as L-value. So, c-style arrays cannot be used on the left side of an assignment and therefore they cannot be used as return values too. But you can return a pointer to a c-style array. Unfortunately, this will result in with two big issues: you have to know the size of the created array outside of the method and you have to implement the resource management based on this wild pointer. Both issues are a good source for errors.

A std::vector supports the c++ copy semantic. Therefore, it can be used as return value. The following example shows possible implementation for both kinds of arrays.

int* CreateArray()
{
  int* values = new int[3] { 1, 2, 3 };
  return values;
}

std::vector CreateVector()
{
  std::vector values = { 4, 5, 6 };
  return values;
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int* array1 = CreateArray();
  int size = 3;   // you have to know the size of the array
  
  for (int i = 0; i < size; ++i)
  {
    std::cout << array1[i] << std::endl;
  }
  
  delete[] array1; // you have to delete the array
  
  // std vector
  std::vector values2 = CreateVector();
  
  for (auto element : values2)
  {
    std::cout << element << std::endl;
  }
}

 

Range check

One of the base concepts of c and c++ is to provide build in types without any overhead which could result in performance loss or higher memory usage. Therefore, such types do not contain a management layer with range checks and similar features. Of course, this missing secure access layer could result in implementation errors which cause undefined behavior of your application.

The std::vector expands this concept and provides a combination of the advantages of both ideas. In favor of performance it comes without any management layer in release build and on the other hand it offers a secure access layer in debug builds. If you create a debug build, the std::vector contains range checks. This will help to find implementation errors. The following source code shows an according example. If you execute this example as debug build, the erroneous std::vector access will throw an out of range exception.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = { 1, 2, 3 };

  values1[5] = 15;   // undefined behavior
   
  // std vector
  std::vector values2 {4, 5, 6};
   
  values2[5] = 25;  // out of range exception in debug build  
}

 

Standard algorithms

The std::vector provides an STL-consistent interface and can be used with the STL algorithms. The STL algorithms also provide support for c-style arrays. If you have c-style arrays with known size, you can directly use them together with the algorithms. As shown in the previous examples, pointer decay will result in a loss of the size information and therefore in a loss of direct support of algorithms. But you can use a template based approach to keep the size information of a c-style method parameter.

The following example shows how to use both container types in STL algorithms. In this case the sort algorithm is used. Furthermore, it shows an implementation to pass a c-style array as a reference to template function, which keeps the size information and allows the use of STL algorithms.

// template based function which keeps size information 
// of c-style array and therefore allows to use STL algorithms
template 
void SortArray(T(&values)[Size])
{
  std::sort(std::begin(values), std::end(values));  
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int values1[3] = { 7, 5, 3 };
  
  std::sort(std::begin(values1), std::end(values1));

  for (auto element : values1)
  {
    std::cout << element << std::endl;
  }
  
  // std vector
  std::vector values2 = { 7, 5, 3 };

  std::sort(values2.begin(), values2.end());
  
  for (auto element : values2)
  {
    std::cout << element << std::endl;
  }
}

 

Memory usage

A std::vector is a wrapper for a c-style array with additional management data needed for the grow and shrink functionality. This management data needs a few bytes extra space. Therefore, compared with a c-style array, the std::vector needs more memory. In most cases this extra memory can be ignored especially if you use large containers. Moreover, if you want to implement a c-style array which can grow and shrink you have to store additional management data too.

In my opinion, there is one use case where this higher memory usage will be crucial: You need a lot of small containers with fixed size during lifetime but variable size on initialization. In this case the grow and shrink functionality of the std::vector with its according management data is unnecessary overhead as it is not needed during lifetime of the container. It is only needed because the initialization size of the container is not known at compile time. If you have to create many of such containers and if they contain a small amount of data, a c-style array may be a good choice. But even in this case I would recommend the use of a c-style array only if you have a high memory workload which results in real issues. If you don’t have measurable issues you should use a std::vector anyway. The higher memory need is a small price which you should pay to get all the nice advantages of a std::vector which will help you to create more robust applications.

 

Performance

The C++ core guidelines suggest to use a std::vector instead of a c-style array (see “SL.10: Prefer using STL array or vector instead of a C array”). That’s because the std::vector offers a lot of advantages without a performance loss. Therefore, it does have the same performance as a raw c-style array.

But unfortunately, there is a persistent myth that for run-time speed, one should use c-style arrays. One reason for this myth are examples which compare c-style array and std::vector speed. Such examples will sometimes show a huge performance difference and they explain this difference as result of the std::vector management functionality for the size flexibility. These examples have one common problem: they use the std::vector in a wrong way. This poor implementation causes the performance loss. Following we will see some typical examples.

A second reason for this myth to persist is when comparing c-style arrays and std::vector in debug mode. As shown previously, the std::vector does range checks during debug mode. Of course, this will reduce the performance. The c-style array performance instead is indifferent between debug and release mode. Therefore, performance comparisons in debug mode are not meaningful.

As mentioned before, we will now look at some typical implementation issues which reduce the performance of a std::vector. After I have seen some of the examples with supposedly bad performance of the std::vector, I found two main issues: slow write and slow read commands. The following source code shows a typical example for a slow write and a slow read implementation and a right way to implement it.

If you create the std::vector and fill it with data, you should set an expected size. Normally you know the approximately or average size for your container. If you set this size during creation you avoid unnecessary grow operations. The second part of the example shows two ways to access the std::vector elements: you can use the “at()” method or you can use the “[]” operator. But these are not two different spellings for the same thing. There is a different functionality too. The “at()” method will execute an additional range check. And this range check is done in release build too. Therefore, it will add functionality for a secure element access but of course, this extra functionality comes with performance costs.

int _tmain(int argc, _TCHAR* argv[])
{
  // reserve memory before adding elements
  // e.g. you know the vector will have 100 up to 150 elements, 
  // then already reserve space for 100 elements
  std::vector values;
  values.reserve(100);
   
  int x = 125;  // example only, will be a variable size of 100 to 150
  for (int i = 0; i < x; i++)
  {
    values.push_back(i);
  }

  // dont use at() except you explicitly want a range check
  int element1 = values.at(17);   // range check on debug and release build
  int element2 = values[17];      // no range check
}

 

By avoiding these two implementation errors, a lot of the examples showing performance issues can be proved wrong. Let me mention two more performance thoughts before we finish this topic. Another important fact is that you should try to avoid inserting an element in front of the std::vector. For an insert at the front, a copy of the whole vector is necessary. If you cannot avoid such inserts you may use some other container. But do performance measurements in this case. Other containers lie a list will support an insert at the front but they come with a different memory management which results in performance loss in other situations. Therefore, even for a use case with (sporadic) insert at front, the std::vector may be the best choice.

A further performance improvement can be reached by using “emplace_back()” instead of “push_back()” while inserting into a std::vector. To keep it simple you can say: emplace back moves an object into a container where push back copies it. If the object supports this movement you will get a performance gain. You don’t have to worry about whether the object supports movement or not. If it isn’t supported the function will copy it anyway. So, emplace back can be used in any case and it will be as fast as or faster as push back, depending on the object you want to add to the vector.

 

Summary

If you need a container with variable size, your choice will be easy for normal use case. Let me repeat Bjarne Stroustrup’s suggestion: “By default, use Vector when you need a container”. I would recommend the same and suggest using another container type only if it adds some benefit in your special use case. And please keep in mind: This benefit must be measurable, especially if you expect a performance gain. Unfortunately, often the choice for a container type is based on belief and not on measurable facts. Finally I will finish the article with a short summary of the pros and cons of a std::vector compared to a c-style array.

Advantages of std::vector

  • Perfectly matches with the strict type system concept of c++
  • Copy semantic / no pointer decay / the array doesn’t convert to a pointer unless you explicitly ask it to
  • It knows its size
  • Range check in debug builds
  • No memory overhead beyond what it needs to hold its elements
  • Same performance as a c-style array
  • Direct support of use in STL algorithms

Disadvantages of std::vector

  • Slightly higher memory need as additional management data must be stored. This disadvantage is only true in case you have the second of the three main use cases shown at the beginning, with a variable size initialization but a fixed size during use of the container.
Veröffentlicht unter C++ | Kommentar hinterlassen

std::array vs c-style array

Within this article I want to compare the base array type and the array offered by the std library. I want to call these two types “c-style array” and “std::array”. The comparison will show you the pros and cons of the different types and give you a guideline when to use which kind of array.

Base concept of std::array

The std::array is designed as zero-overhead wrapper for c-style arrays. It will provide a value like semantics equally to the other C++ containers. A std::array should have same runtime performance as a c-style array.

Within the next chapters I want to compare both array types by looking at some base use cases.

Initialize, change and read

The following example application shows the initialization of an array with three elements. Furthermore, an array element will be changed and all elements will be read.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int array1[3] = {1, 2, 3};

  array1[1] = 15;

  std::cout << "size: " << (sizeof(array1) / sizeof(*array1)) << std::endl;

  for (auto element : array1)
  {
    std::cout << element << std::endl;
  }
  
  // std-style array
  std::array array2 = { 4, 5, 6 };

  array2[1] = 25;

  std::cout << "size: " << array2.size() << std::endl;

  for (auto element : array2)
  {
    std::cout << element << std::endl;
  }
}

Within this first example you can see that the syntax for the variable declaration is slightly different, but element access and usage of array in a range based for loop is the same for both types. Furthermore, the example shows a first big difference of both types: the std::array knows its size. For the c-style array you have to create an additional variable to store the array size or, as shown in the example, you can calculate the size by dividing the memory reserved for the array by the memory needed to store a single element.

Array as method parameter

In a real application, you will split up your program logic into functions and therefore a standard use case is to pass parameters to functions. So, let’s change the example a little bit and create functions which should print the array.

void PrintArray(int* values, int size)
{
  std::cout << "size: " << size << std::endl;

  for (int i = 0; i < size; ++i)
  {
    std::cout << values[i] << std::endl;
  }
}

void PrintArray(const std::array &values)
{
  std::cout << "size: " << values.size() << std::endl;

  for (auto& value : values)
  {
    std::cout << value << std::endl;
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  int array1[3] = { 1, 2, 3 };
  PrintArray(array1, 3);
  
  std::array array2 = { 4, 5, 6 };
  PrintArray(array2);
}

This simple use case shows a very important difference between a c-style array and a std::array. The c-style array will be passed as pointer to its first parameter. The std::array instead offers the standard copy semantic. Therefore, you can use standard coding practices and for example pass the parameter by const reference. So, the std::array doesn’t convert to a pointer unless you explicitly ask it to.

A c-style array is passed as pointer to the first element. This principle is called “pointer decay”. As the array decays to a pointer, the size information is loss. Therefore, we have to pass an additional function parameter which contains the array size. The previously shown calculation depending on the memory size will no longer work, as the memory size of the passed parameter is the size of a pointer now. Furthermore, as the size information is lost, the range based for loop cannot be used anymore.

You can see, this little design change of the application, has nearly no impact if you use a std::array. You can simple refactor your code and extract the according code section as method. But if you try to do the same for a c-style array, you will get in trouble and as result of the pointer decay you have to change your implementation fundamentally. Within the following article you can see some other impacts of the pointer decay, especially in base and derived class relationships: Arrays and inheritance, a source of errors.

std::array as method parameter with variable size

Maybe you already noticed one limitation of the print function we implemented for the std::array. The function will accept an array with three integer values only. It will not work with array of other types or other sizes.

At first you may think this is a limitation or disadvantage of std::array. But it isn’t. Moreover it is a big advantage of this data type as std::array will perfectly match with the C++ base design which uses a strict type system. We have a fixed size as part of type declaration. So each std::array with a different size is a different type. This will allow the compiler to detect and prevent implementation errors.

Of course, a strict type system will make it difficult to create higher-order functions. For these cases c++ offers function overloading, for example by implementing templates.

The following source code shows a template based implementation which allows to write a print function which accepts a std::array of any size. Furthermore, you will see a second function which is even more generic as it accepts any kind of container object.

template
void PrintArray(const std::array &values)
{
  for (auto value : values)
  {
    std::cout << value << std::endl;
  }
}

template
void PrintContainer(T &values)
{
  for (auto value : values)
  {
    std::cout << value << std::endl;
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::array array2 = { 4, 5, 6 };
  PrintArray(array2);
  PrintContainer<std::array>(array2);
}

c-style array as method parameter with size information

Of course, the template concept can also be used for a c-style array. Instead of passing a pointer to the first element, you can pass the array as a reference to template function. This will allow to pass the array with its specific size.

template 
void PrintArray(T(&values)[Size])
{
  for (auto value : values)
  {
    std::cout << value << std::endl;
  }
}

int _tmain(int argc, _TCHAR* argv[])
{
  int array1[3] = { 1, 2, 3 };
  PrintArray(array1);
}

Array as return value

C-style arrays are among the few types which cannot be used as L-value. So, c-style arrays cannot be used on the left side of an assignment and therefore they cannot be used as return values too. But you can return a pointer to a c-style array. Unfortunately, this will result in with two big issues: you have to know the size of the created array outside of the method and you have to implement the resource management based on this wild pointer. Both issues are a good source for errors.

A std::array supports the c++ copy semantic. Therefore, it can be used as return value. The following example shows possible implementation for both kinds of arrays.

int* CreateArray()
{
  int* array = new int[3] { 1, 2, 3 };
  return array;
}

std::array CreateStdArray()
{
  std::array array = { 4, 5, 6 };
  return array;
}

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int* array1 = CreateArray();
  int size = 3;   // you have to know the size of the array

  for (int i = 0; i < size; ++i)
  {
    std::cout << array1[i] << std::endl;
  }

  delete[] array1; // you have to delete the array
   
  // std-style array
  std::array array2 = CreateStdArray();

  for (auto element : array2)
  {
    std::cout << element << std::endl;
  }
}

In both cases you can create factory methods for arrays of any length. In case of a c-style array I would prefer the length as input parameter and in case of the std::array I would suggest the template based approach, shown in a previous section about method parameters.

If the client does not know the resulting size of the array and the factory method itself should be responsible to create a collection of the needed size, you should not use a std::array at all. The array is a collection of fixed size but you want to use a collection of variable size. This sound like a use case for the std::vector. At this moment, I don’t want to go further into details as this doesn’t match with the topic of this article. But in a further article I want to continue with a comparison of c-style array and std::vector.

Range check

One of the base concepts of c and c++ is to provide build in types without any overhead which could result in performance loss or higher memory usage. Therefore, arrays do not contain a management layer with range checks and similar features. Of course, this missing secure access layer could result in implementation errors which cause undefined behavior of your application.

The std::array expands this concept and provides a combination of the advantages of both ideas. In favor of performance it comes without any management layer in release build and on the other hand it offers a secure access layer in debug builds. If you create a debug build, the std::array contains range checks. This will help to find implementation errors. The following source code shows an according example. If you execute this example as debug build, the erroneous std::array access will throw an out of range exception.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int array1[3] = {7, 5, 3};
  
  std::sort(std::begin(array1), std::end(array1));

  for (auto element : array1)
  {
    std::cout << element << std::endl;
  }
  
  // std-style array
  std::array array2 = { 7, 5, 3 };

  std::sort(array2.begin(), array2.end());
  
  for (auto element : array2)
  {
    std::cout << element << std::endl;
  }
}

Performance and memory usage

As mentioned before, the std::array is designed as zero-overhead wrapper for c-style arrays. Therefore, both types are optimized regarding performance and memory usage. The std::array has an identical memory layout to c-style arrays. Furthermore, they have the same performance, except in debug builds. As shown in the previous chapter, the std::array contains range checks in debug build and therefore it has a slightly lower performance. In release build the performance of std::array and c-style array are same. By definition, the std::array is a simple aggregate containing a c-style array as its only member.

Array and standard algorithms

The std::array provides an STL-consistent interface and can be used with the STL algorithms. The STL algorithms also provide support for c-style arrays. If you have c-style arrays with known size, you can directly use them together with the algorithms. As shown in the previous examples, pointer decay will result in a loss of the size information and therefore in a loss of direct support of algorithms. But again, you can use the template based approach to keep the size information of a c-style method parameter.

The following example shows how to use both array types in STL algorithms. In this case the sort algorithm is used.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int array1[3] = {1, 2, 3};

  array1[5] = 15;   // undefined behavior
   
  // std-style array
  std::array array2 = { 4, 5, 6 };

  array2[5] = 25;   // out of range exception in debug build
}

Multi-dimensional arrays

The std::array as well as the c-style array support multi-dimensions. These multi-dimensions are designed as arrays of nested arrays. The syntax to declare a std::array directly reflects this nested design. But this comes with the downside of a more awkward syntax than compared to a c-style array. The following example shows how to declare, write and read a multi-dimensional array.

int _tmain(int argc, _TCHAR* argv[])
{
  // c-style array
  int array1[2][3] = { { 1, 2, 3 }, { 4, 5, 6} };

  array1[0][1] = 10;
  
  for (auto &row : array1)
  {
    for (auto &element : row)
    {
      std::cout << element << std::endl;
    }
  }     

  // std-style array
  std::array<std::array, 2> array2{ { { 7, 8, 9 }, { 10, 11, 12 } } };

  array2[0][1] = 20;

  for (auto &row : array2)
  {
    for (auto &element : row)
    {
      std::cout << element << std::endl;
    }
  }
}

Summary

The std::array is designed as zero-overhead wrapper for c-style arrays. Therefore, it comes with all the advantages of a c-style array and ads the additional advantages of C++ containers. I would recommend to use std::arrays except you have a rare special use case which comes with good reasons to use a c-style array.
Advantages of std::array

  • Perfectly matches with the strict type system concept of c++
  • Copy semantic / no pointer decay / the array doesn’t convert to a pointer unless you explicitly ask it to
  • It knows its size
  • Range check in debug builds
  • No memory overhead beyond what it needs to hold its elements
  • Same performance as a c-style array
  • Direct support of use in STL algorithms

Disadvantages of std::array

  • Slightly more awkward syntax for declaration of the array, especially when creating multi-dimensional arrays
Veröffentlicht unter C++ | 1 Kommentar

STL Algorithms

The STL algorithms offer a powerful way to solve some base programming use cases. If you know and use these algorithms you can write more clean, efficient and robust code. But unfortunately, it isn’t that easy to know or find the right algorithm for a use case. In my opinion, this is a result of the not practical kind of documentation – especially grouping of the algorithms is not related to practice – and on the other hand the naming of some functions. Therefore, I wrote a series of articles with a practice-related overview about the STL algorithms. Following you will find the overview about this article series with links to the articles.

Algorithms to use, find and check container elements

Algorithms to copy, move and remove container elements

Algorithms to change and sort container elements

Algorithms for sorted containers

Algorithms for two containers

Operations which can be used in STL algorithms

Create own STL algorithms wrapper

Veröffentlicht unter C++ | Kommentar hinterlassen

Create own STL algorithms wrapper

Within the previous articles about STL algorithms I introduced the main algorithms sorted by their use case. This analysis of the actual STL algorithms has shown the power of the algorithms but also their issues. The main issues are misleading names and a sometimes-cumbersome notation.

Of course, it is easy to write source code by using the algorithms, but it is sometimes hard to read and understand existing code. Notation matters more than we usually like to believe. Therefore, it may be useful to create an own algorithm wrapper and hide the misleading names and the cumbersome notation behind an easy to use interface.

Within this article I want to show some examples for algorithms wrapper. These wrapper functions add a self-explaining interface and therefore you don’t have to write comments to explain the algorithms. If you use these algorithms without a wrapper, you must write comments to each algorithm to explain its behavior.

 

Example: remove elements from container

The algorithm “remove” is used to remove elements from a container. But unfortunately, this algorithm will not remove the elements. Instead it will shift the elements to delete to the end of the container and returns an iterator to the new logical end. So, you always must explicitly erase the elements after calling the algorithm. I think that’s a perfect candidate for a wrapper function.

template
void RemoveElement(C& container, E element) 
{ 
  // remove (shifts elements but dont deletes them)
  auto newEndIterator = std::remove(container.begin(), container.end(), element);
  
  // delete elements
  container.erase(newEndIterator, container.end());
}

void Print(const std::vector& data)
{
  std::for_each(data.begin(), data.end(), [](int element){ std::cout << element << " "; });
  std::cout << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data{ 5, 7, 9, 2, 4, 9, 3 };

  RemoveElement(data, 9);

  Print(data);  // output is '5 7 2 4 3'

  return 0;
}

 

Example: count equal pairs in two containers

With the algorithm “inner_product” you can – like the name reveals – calculate an inner product. But you can specify own operations and then the function will offer a common map/reduce functionality according to the given algorithm. So, if you set an algorithm the name “inner_product” is wrong as the function no longer calculates the inner product.

The following example shows how to count the equal pairs of two containers. If you put this function into a self-explaining wrapper function, the code will become much easier to understand.

template
int CountEqualPairs(C& container1, C& container2)
{
  return std::inner_product(
    container1.begin(), container1.end(), container2.begin(),
      0, std::plus(), std::equal_to());    
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data1{ 5, 7, 9, 2, 4, 9, 3 };
  std::vector data2{ 1, 0, 9, 8, 4, 3, 3 };
  
  int result = CountEqualPairs(data1, data2);

  std::cout << result << std::endl;   // result is '3'

  return 0;
}

 

Example: pairwise multiplication

The algorithm “transform” is another good candidate for a wrapper. This algorithm can be used for a wide range of use cases and therefore it has a cumbersome notation. So, if you read an implementation using this algorithm you must stop for a short (or sometimes long) moment and think about the purpose of the source code. In such cases you are happy if the developer has written a comment and explains the behavior of the algorithm. But often such a comment is missing.

As explained above, a self-explaining wrapper can help to write clean code which is easy to understand and to maintain. The following example shows a possible implementation for a pairwise multiplication.

template
C PairwiseMultiply(C& container1, C& container2)
{
  C target;

  std::transform(
    container1.begin(), container1.end(),
    container2.begin(),
    std::back_inserter(target),
    std::multiplies());  

  return target;
}

void Print(const std::vector& data)
{
  std::for_each(data.begin(), data.end(), [](int element){ std::cout << element << " "; });
  std::cout << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data1{ 5, 7, 9, 2, 4, 9, 3 };
  std::vector data2{ 1, 0, 9, 8, 4, 3, 3 };
  std::vector target;
  
  target = PairwiseMultiply(data1, data2);

  Print(target);  // output is '5 0 81 16 16 27 9'

  return 0;
}

 

Summary

The STL algorithms are sometimes hard to understand. If you read source code and you find an algorithm, you have to stop for a moment and think about its purpose. A wrapper library helps to hide these implementation details behind a self-explaining interface. This will allow you to create more readable and maintainable source code.

Veröffentlicht unter C++ | 1 Kommentar

Operations which can be used in STL algorithms

Within the previous articles about STL algorithms I introduced the main algorithms sorted by their use case. Within this article I want to give an overview about the STL operations. These operations can be used stand-alone and within STL algorithms. The combination of the existing algorithms and operations will allow to solve complex implementations of some standard use cases in an easy way.

 

Compare two elements

Algorithm Description
equal_to Check whether the two elements are equal.
not_equal_to Check whether the two elements are not equal.
greater Check whether the first element is greater than the second one.
less Check whether the first element is smaller than the second one.
greater_equal Check whether the first element is greater or equal to the second one.
less_equal Check whether the first element is smaller or equal to the second one.
max Returns the larger one of the two elements.
min Returns the smaller one of the two elements.
minmax Returns a pair which contains the smaller element followed by the larger element.

 

Within the examples the operations “greater” and “less” are used to sort a container. The third example makes use of “max” operator to merge two containers by taking the maximum value of each element pair.

void Print(const std::vector& data)
{
  std::for_each(data.begin(), data.end(), [](int element){ std::cout << element << " "; });
  std::cout << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data{ 5, 7, 9, 2, 4 };

  // greater
  std::sort(data.begin(), data.end(), std::less());
  Print(data);  // output is '2 4 5 7 9'

  // less
  std::sort(data.begin(), data.end(), std::greater());
  Print(data);  // output is '9 7 5 4 2'

  // max
  std::vector data1{ 5, 7, 9, 2, 4 };
  std::vector data2{ 3, 8, 1, 5, 8 };
  std::vector target;
  std::transform(data1.begin(), data1.end(), 
    data2.begin(), std::back_inserter(target), 
    [](int a, int b) {return std::max(a, b); });
  Print(target);  // output is '5 8 9 5 8'

  return 0;
}

 

Search elements in container

Algorithm Description
max_element Returns an iterator to the largest element of the container.
min_element Returns an iterator to the smallest element of the container.
minmax_element Returns a pair which contains an iterator to the smallest element of the container followed by an iterator to the largest element of the container.

 

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data{ 5, 7, 9, 2, 4 };

  // max_element
  auto iterator = std::max_element(data.begin(), data.end());
  if (iterator != data.end())
  {
    std::cout << "max element " << *iterator << " found at index " << std::distance(data.begin(), iterator) << std::endl;
    // output is 'max element 9 found at index 2'
  }

  return 0;
}

 

Calculations with one input

Algorithm Description
negate Returns the negation of the element.
clamp Returns the given element with respect to a value range. If the given element is outside of the range the returned value will be equal to the lowest or highest boundary.

Example: “clamp(x,20,30)” means the returned value must be in range 20 to 30. If x is less than 20 “20” will be returned. If x is between 20 to 30 “x” will be returned. If x is larger than 30 “30” will be returned.

 

void Print(const std::vector& data)
{
  std::for_each(data.begin(), data.end(), [](int element){ std::cout << element << " "; });
  std::cout << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector data{ 5, -7, 9, -2, 4 };
  
  // negate
  std::transform(data.begin(), data.end(), data.begin(), std::negate());
  Print(data);  // output is '-5 7 -9 2 -4'

  return 0;
}

 

Calculations with two inputs

Algorithm Description
plus Returns the sum of the two elements.
minus Returns the difference of the two elements.
multiplies Returns the product of the two elements.
divides Returns the quotient of the two elements.
modulus Returns the residual value of the division of first element by second element.
logical_and Returns the result of a logical-and calculation of the two elements.
logical_or Returns the result of a logical-or calculation of the two elements.
logical_not Returns the result of a logical-not calculation of the two elements.
bit_and Returns the result of a bitwise-and calculation of the two elements.
bit_or Returns the result of a bitwise-or calculation of the two elements.
bit_xor Returns the result of a bitwise-xor calculation of the two elements.
bit_not Returns the result of a bitwise-not calculation of the two elements.

 

void Print(const std::vector& data)
{
  std::for_each(data.begin(), data.end(), [](int element){ std::cout << element << " "; });
  std::cout << std::endl;
}

int _tmain(int argc, _TCHAR* argv[])
{
  std::vector source1{ 5, 7, 9, 2, 4 };
  std::vector source2{ 3, 1, 2, 5, 7 };
  std::vector target;

  std::vector source3{ 1, 1, 0, 0 };
  std::vector source4{ 1, 0, 1, 0 };

  // sum
  std::transform(source1.begin(), source1.end(), source2.begin(),
    std::back_inserter(target), std::plus());

  Print(target);  // output is '8 8 11 7 11'

  // logical or
  target.clear();

  std::transform(source3.begin(), source3.end(), source4.begin(),
    std::back_inserter(target), std::logical_or());

  Print(target);  // output is '1 1 1 0'

  // bitwise xor
  target.clear();

  std::transform(source3.begin(), source3.end(), source4.begin(),
    std::back_inserter(target), std::bit_xor());

  Print(target);  // output is '0 1 1 0'
  
  return 0;
}
Veröffentlicht unter C++ | 1 Kommentar