Perfect Forwarding

Perfect forwarding is an implementation concept which helps avoid the forwarding problem. Therefore, if we want to understand the need for perfect forwarding, we have to start by looking at the forwarding problem. This issue can occur when you have to write a generic function that takes references as parameters and forwards these parameters to another function.

 

The forwarding problem

An easy example of a template which forwards parameters to other functions is a factory. A factory provides a generic template function which is used to create instances of different objects. The generic function will call the constructor and maybe initializer of the object to create and therefore it forwards the factory function parameters to the object functions.

In the example of this article we want to create object instances for a class to manage and draw a line. Such a line is implemented by defining a start point, a length and a direction. To keep it simple we don’t want to implement the line classes itself with all details. We want to look at the creation of object instances and therefore we use an exemplary class definition which offers a constructor only. The combination of a point, length and direction to define the line allows us to use a data structure and a simple base type for the constructor parameters.

For the point data class we will use the following object:


struct Point

{

  Point()

  {

  };

  Point(int x, int y) : X{ x }, Y{ y }

  {

  };

private:

  double X;

  double Y;

};

We assume that we have different implementations for the line object. For example they may be platform-, device- or version specific. The following code shows two different line objects.


class Line

{

public:

  Line(Point& x, int direction, int length) 

    : mX{ x }, mDirection{ direction }, mLength{ length }

  {

  };

private:

  Point mX;

  int mDirection;

  int mLength;

};

class OtherLine

{

public:

  OtherLine(const Point& x, const int direction, const int length) 

    : mX{ x }, mDirection{ direction }, mLength{ length }

  {

  };

private:

  Point mX;

  int mDirection;

  int mLength;

};

As you can see the constructors have little differences: const vs. non const parameters. As we can’t change the implementations of the line classes, our factory has to support these different use cases.

 

As we now know the existing code, we can start to implement our factory. The following code shows a possible implementation. It contains a factory class with a template function to create the line instances. Furthermore we create a console application which will use the factory to create line instances. The factory is called several times, each time with a little bit different combination of parameters (members vs. temporary objects and integer literals).


class Factory

{

public:

  template<typename T, typename A, typename B, typename C>

  static T CreateModule(A x, B direction, C length)

  {

    return T(x, direction, length);

  }

};

int _tmain(int argc, _TCHAR* argv[])

{

  Point x{ 4, 7 };

  int i{ 6 };

  Factory::CreateModule<Line>(x, i, i);

  Factory::CreateModule<Line>(x, i, 5);

  Factory::CreateModule<Line>(x, 5, 5);

  Factory::CreateModule<Line>(Point(), i, i);

  Factory::CreateModule<Line>(Point(), i, 5);

  Factory::CreateModule<Line>(Point(), 5, 5);

  Factory::CreateModule<OtherLine>(x, i, i);

  Factory::CreateModule<OtherLine>(x, i, 5);

  Factory::CreateModule<OtherLine>(x, 5, 5);

  Factory::CreateModule<OtherLine>(Point(), i, i);

  Factory::CreateModule<OtherLine>(Point(), i, 5);

  Factory::CreateModule<OtherLine>(Point(), 5, 5);

  return 0;

}

So we are done. The factory is implemented and works fine for the different use cases. So where is the forwarding problem? I didn’t occur yet because we have implemented a factory which copies the function parameters and don’t forward it. Therefore the factory will work fine but you may get performance issues, especially if you have to use large data structures and not such simple ones like the point. Therefore, as next step, we want to forward the parameters instead of copy them. So we modify the factory function a little bit and pass the parameters as reference.


class Factory

{

public:

  template<typename T, typename A, typename B, typename C>

  static T CreateModule(A& x, B& direction, C& length)

  {

    return T(x, direction, length);

  }

};

int _tmain(int argc, _TCHAR* argv[])

{

  Point x{ 4, 7 };

  int i{ 6 };

  Factory::CreateModule<Line>(x, i, i);

  Factory::CreateModule<Line>(x, i, 5);         // error

  Factory::CreateModule<Line>(x, 5, 5);         // error

  Factory::CreateModule<Line>(Point(), i, i);

  Factory::CreateModule<Line>(Point(), i, 5);   // error

  Factory::CreateModule<Line>(Point(), 5, 5);   // error

  Factory::CreateModule<OtherLine>(x, i, i);

  Factory::CreateModule<OtherLine>(x, i, 5);    // error

  Factory::CreateModule<OtherLine>(x, 5, 5);    // error

  Factory::CreateModule<OtherLine>(Point(), i, i);

  Factory::CreateModule<OtherLine>(Point(), i, 5);  // error

  Factory::CreateModule<OtherLine>(Point(), 5, 5);  // error

  return 0;

}

Unfortunately this will result in errors. The function calls using rvalues will not work any longer. For example it is not possible to pass the integer literal as reference. But with another modification we may solve this issue. So let us try to change the factory function again and pass the parameters as const reference.


class Factory

{

public:

  template<typename T, typename A, typename B, typename C>

  static T CreateModule(const A& x, const B& direction, const C& length)

  {

    return T(x, direction, length);   // error

  }

};

int _tmain(int argc, _TCHAR* argv[])

{

  Point x{ 4, 7 };

  int i{ 6 };

  Factory::CreateModule<Line>(x, i, i);

  Factory::CreateModule<Line>(x, i, 5); 

  Factory::CreateModule<Line>(x, 5, 5); 

  Factory::CreateModule<Line>(Point(), i, i);

  Factory::CreateModule<Line>(Point(), i, 5);

  Factory::CreateModule<Line>(Point(), 5, 5);

  Factory::CreateModule<OtherLine>(x, i, i);

  Factory::CreateModule<OtherLine>(x, i, 5); 

  Factory::CreateModule<OtherLine>(x, 5, 5);   

  Factory::CreateModule<OtherLine>(Point(), i, i);

  Factory::CreateModule<OtherLine>(Point(), i, 5);  

  Factory::CreateModule<OtherLine>(Point(), 5, 5);  

  return 0;

}

Now the factory function call will work but the factory can no longer call the non const line constructors. One possible solution is overloading. You may provide overloaded versions of the factory function with const and non const parameters. Of course, with increasing number of parameters the number of possible parameter combinations will grow exponentially. And of course this procedure contradicts our goal to implement a single template function.

 

Now we see the forwarding problem and we can summarize it with the following explanation. The forwarding problem can occur when you write a generic function that takes references as its parameters forwards these parameters to another function. If the generic function takes a parameter of type T&, then the function cannot be called by using an rvalue. If the generic function takes a parameter of type const T&, then the called function cannot modify the value of that parameter.

 

Perfect forwarding

As we now have seen the forwarding problem we will try to solve this issue and implement perfect forwarding within our factory template function. This will be possible by using rvalue references parameters. They enable us to write one template function which accepts const and non const arguments and forwards them to another function as if the other function had been called directly. For more details about the concept of lvalues, rvalues and rvalue references you may read my previous article.

We can use the rvalue referenace declaratory and adapt our template function.


template<typename T, typename A, typename B, typename C>

static T CreateModule(A&& x, B&& direction, C&& length)

{

  ...

}

To understand this template we have to know two C++ rules: type deduction and reference collapsing.

 

Type deduction

In case of a template function T&& is not an rvalue reference. When the function is instantiated, T depends on whether the argument passed to the function is an lvalue or an rvalue. If it’s an lvalue of type U, T is deduced to U&. If it’s an rvalue, T is deduced to U. This rule may seem unusual but it starts making sense when we realize it was designed to solve the perfect forwarding problem.

 

Reference collapsing

The other rule is reference collapsing. Taking a reference to a reference is illegal in C++. However, it can sometimes arise in the context of templates and type deduction. On template instantiation there may be types like “int& &” or “int& &&”. While this is something which you cannot write in code the compiler will accept such template instantiations and infers a single reference from this. The reference collapsing rule is defined for this use case. The rule simply says that “&” always wins and the only case where “&&” results is “&& &&”. Following you will see all possible cases and the result after using the reference collapsing rule.

  • “& &” à “&”
  • “& &&” à “&”
  • “&& &” à “&”
  • “&& &&” à “&&”

 

Forwarding References or Universal References

As we have seen the two new rules we will now understand the rvalue reference within the deducing context. As this type of reference is different from a standard rvalue reference a new term was introduced: universal reference. This term was introduced by Scott Meyers because he clearly want to differentiate between true rvalue references and something what looks like an rvalue reference but might end up being an lvalue reference.

Later, several members of the C++ standard committee acknowledged the fact that there is a need to name the T&& references. So they came up with the term forwarding reference. The proposal for this change explains why the name forwarding reference is preferred over universal reference.

 

std::forward

After this short excursion we want to come back to our template function implementation. As we now understand forwarding references we know that they can be an rvalue or an lvalue reference. To forward this reference to a method the std::forward function was introduced. This function perfectly forwards each parameter either as an rvalue or as an lvalue, depending on how it was passed in.

So we can use the std::forward function to pass our parameters to the object constructor. Following you will see the final factory function of our example and the according use of the factory within the console application.


class Factory

{

public:

  template<typename T, typename A, typename B, typename C>

  static T CreateModule(A&& x, B&& direction, C&& length)

  {

    return T(std::forward<A>(x), std::forward<B>(direction), std::forward<C>(length));

  }

};

int _tmain(int argc, _TCHAR* argv[])

{

  Point x{ 4, 7 };

  int i{ 6 };

  Factory::CreateModule<Line>(x, i, i);

  Factory::CreateModule<Line>(x, i, 5);

  Factory::CreateModule<Line>(x, 5, 5);

  Factory::CreateModule<Line>(Point(), i, i);

  Factory::CreateModule<Line>(Point(), i, 5);

  Factory::CreateModule<Line>(Point(), 5, 5);

  Factory::CreateModule<OtherLine>(x, i, i);

  Factory::CreateModule<OtherLine>(x, i, 5);

  Factory::CreateModule<OtherLine>(x, 5, 5);

  Factory::CreateModule<OtherLine>(Point(), i, i);

  Factory::CreateModule<OtherLine>(Point(), i, 5);

  Factory::CreateModule<OtherLine>(Point(), 5, 5);

  return 0;

}

Perfect forwarding within the C++ standard

The standard libraries make use of perfect forwarding. For example the vector object as well as other containers offer the emplace_back function as alternative to push_back. In case of push_back a temporary object instance is created. This means it is necessary to construct, move and destruct the temporary object. If you use emplace_back instead, the object will be created directly within the vector without the need of a temporary object. This is possible because the function parameter passed to emplace_back us passed by using perfect forwarding.

 

Summary

The concept of perfect forwarding allows you to write efficient code which is executed without the need to create, move and destruct temporary objects. Perfect forwarding is based on forwarding references which use the concepts of type deduction and reference collapsing. These concepts are part of the language and so you can easily write functions using perfect forwarding.

Advertisements
Veröffentlicht unter C++ | Kommentar hinterlassen

Scoped vs. unscoped enum

C++ has two kinds of enumerators. You will find several names for both. Following you will see the two kinds of enumerators and some of their names:

  • Standard enum / plain enum / unscoped enum / conventional enum
  • Strong enum / enum class / scoped enum / new enum / managed enum

 

I think none of the names is perfect because if you are not familiar with the differences of both enumerators all of the names can result in misinterpretations.

I prefer the names standard and strong enum and will therefore use them within this article. As I will explain later on within the article I also like to call them scoped and unscoped enum.

 

Standard enum

The following example shows the creation and usage of a standard enumerator.

namespace MyNamespace
{
  enum Colors
  {
    Red,
    Green,
    Blue
  };
}

using namespace MyNamespace;

int _tmain(int argc, _TCHAR* argv[])
{
  Colors color = Red;
  int enumValue = Colors::Red;

  enumValue = Red + Blue;

  return 0;
}

 

Normally you will implement your applications in different files and therefore the enumerator definition is often separated from its usage. To keep it simple the above example and all following examples are implemented in one file only but with respect to the normal use case of separation in different files. Therefore you will find the uncommon definition of a namespace followed by a using command right before the main method.

The main method shows how you can create an enumerator value. Furthermore it contains examples to demonstrate one of the disadvantages of the standard enumerator. You can access the underlying type directly. Therefore it is possible to create an integer value by an enumerator and it is even possible to do calculations with the enumerator values.

A second issue can be seen in case we add another enumerator. Enumerator names are in the same scope as the enum definition. Therefore “Red”, “Green” and “Blue” are within the “MyNamespace” scope. The following source code shows what happens if we add another enumerator which contains an already used name: a name collision will occur and the compiler reports an error.

namespace MyNamespace
{
  enum Colors
  {
    Red,
    Green,
    Blue
  };

  enum OtherColors
  {
    Yellow,
    Blue,   // error
  };
}

using namespace MyNamespace;

int _tmain(int argc, _TCHAR* argv[])
{
	return 0;
}

 

That’s why the standard enumerator is also called unscoped. The values of an enumeration exist in whatever scope it was declared in.

One possible and often seen solution is to extend the name with a short prefix or with the full enumeration name.

namespace MyNamespace
{
  enum Colors
  {
    ColorsRed,
    ColorsGreen,
    ColorsBlue
  };

  enum OtherColors
  {
    OtherColorsYellow,
    OtherColorsBlue,
  };
}

using namespace MyNamespace;

int _tmain(int argc, _TCHAR* argv[])
{
  Colors color = ColorsBlue;
  OtherColors otherColors = OtherColorsBlue;

  return 0;
}

 

This will work but it reduces the readability of the source code. Another possibility is to use different namespaces. I like to use the following kind of template:

  • Put each enumerator in its own Namespace
  • Use the enumeration name as namespace name
  • Use “Enum” as enumeration name

By using this template the previous can be changed to:

namespace Colors
{
  enum Enum
  {
    Red,
    Green,
    Blue
  };
}

namespace OtherColors
{
  enum Enum
  {
    Yellow,
    Blue,
  };
}

int _tmain(int argc, _TCHAR* argv[])
{
  Colors::Enum color = Colors::Blue;
  OtherColors::Enum otherColors = OtherColors::Blue;

  return 0;
}

 

I really like this template because it creates very clean code. If you declare a variable, a function parameter or a function result you will name it explicitly as “Enum” and on use of the enum you access the values with by using the namespace which contains a nice name for the enumeration. So you get clean code, ideal scoping and it does not affect the runtime or compilation times.

 

Strong enum

C++ offers a second kind of enumerators. This strong enum can be declared by writing “enum class”. In contrast to the standard enum the strong one is scoped. Therefore you don’t have to fear conflicts if you use the same names for enumerator values. The following code shows the adapted example and this time we can add both enums to one namespace.

namespace MyNamespace
{
  enum class Colors
  {
    Red,
    Green,
    Blue
  };

  enum class OtherColors
  {
    Yellow,
    Blue,
  };
}

using namespace MyNamespace;

int _tmain(int argc, _TCHAR* argv[])
{
  Colors color = Colors::Blue;
  OtherColors otherColors = OtherColors::Blue;

  int enumValue = Colors::Red;  // error

  return 0;
}

 

Within the main method I have created two variables and initialized them with enumerator values. As you can see the values are accessed by using the according scope. Furthermore the main method shows another nice concept of the strong enumerator. It is strongly typed and therefore you cannot assign its values to an integer variable. The compiler will show an according error message for this assignment. This prevents accidental misuse of constants. The strong enum is an “enum class” because it combines the traditional enumeration aspects with the features of classes.

 

Advantages of enum classes (part 1)

Within the C++11 FAQ Bjarne Stroustrup names the following advantages of strong enums:

  • conventional enums implicitly convert to int, causing errors when someone does not want an enumeration to act as an integer.
  • conventional enums export their enumerators to the surrounding scope, causing name clashes.
  • the underlying type of an enum cannot be specified, causing confusion, compatibility problems, and makes forward declaration impossible.

This is a nice summary of the advantages and I think it is worth to have a deeper look into some aspects.

 

The underlying type of an enum

What bothers me with enums is the focus on the underlying type. Two of the three points from Bjarne Stroustrup contain topics regarding the underlying type of the enum. From my point of view these advantages are insignificant because I think they don’t reflect the use case for enums. I think to use the underlying value of enums, compare enums (, =) or even calculate with enums (a+b) is bad coding style, no matter whether you use standard or strong enums (with type casting).

Normally we create an enum because we want to represent a set of values belonging together, for example the four cardinal points. Therefore the enums values are grouped identifiers. Their main purpose is to increase the code readability and quality. As developer you want to write your code by using the enum identifers and you should never waste a thought whether the underlying type is an integer or a char or something else. Therefore never ever quantify enums or calculate with enums.

Of course there may be exceptions to this rule. For example if you want to optimize your code to get the highest possible execution performance, you may have to use standard enums and even use their underlying types. With such kind of optimizations you will explicitly move your quality criteria away from things like readability to performance. In such scenarios where you explicitly want to use the underlying type I would recommend to directly create variables of this type and don’t use enums. In such cases you can write your high performance code without the use of enums and offer a scoped enum at higher level, for example within you API.

Therefore we don’t want to look at such rare use cases. In standard use cases you have to use an enum by its identifier only and never use the underlying type. If you have to write an API which offers integer values instead of the enums (e.g. for compatibility reasons) you should write explicit converter functions which converts the enum from and to an integer. This will highly increase the readability and maintainability of your code.

 

Advantages of enum classes (part 2)

With this thought in mind we can come back to the advantages of strong enums. By following the above guideline and never use the underlying type we can filter out these advantages from Bjarne Stroustrup’s list. As result the following two advantages of strong enums remain:

  • strong enums are scoped
  • strong enums allow forward declaration

 

We have seen the advantage of scoping within the example above so we now want to have a look at the forwarding feature. The following code shows a typical use case. The enumerator is declared in one file. Within another file you declare your class interface and want to use the enumerator for example for method parameters or return values. And in a third file you implement and use your class interface. As the strong enum allows forward declaration you can use this feature within the file which contains you class interface. You will see the according forward declaration within the following source code.

// header file with interface declaration

enum class Colors;

Colors GetBackgroundColor();

// file with enum definitions

enum class Colors
{
  Red,
  Green,
  Blue
};

// source file with interface definition

Colors GetBackgroundColor()
{
  return Colors::Blue;
}

int _tmain(int argc, _TCHAR* argv[])
{
  Colors color = GetBackgroundColor();

  return 0;
}

 

Within my projects I had to deal more often with enumerator scoping conflicts than with the need for forward declaration. Therefore I think the fact that the strong enums are scoped is their main advantage. That’s why the title of this article is “scoped vs. unscoped enum”. And as I told you at the beginning of the article I also like to call the enums soped and unscoped instead of standard and strong.

 

Summary

The strong enum offers some really nice advantages compared to the standard enum. The most importand one is the scoping feature, followed by the support of forward declaration. Therefore you should prefer strong enums.

If you have to use standard enum (e.g. in legacy code) you can increase the readability of your code by using the namespace pattern shown above.

No matter if you use standard enums or strong enums you should always use them as grouped identifiers. If there is a need to use the underlying type then this is an indication that you don’t want to use an enum at all. In this case it is better to create variables of the underlying type and offer the enumerator in the API only. In this case the conversion from the enumerator use in the API to the internal type shall be done with explicitly implemented converter methods which will not access the underlying type of the enumerator.

Veröffentlicht unter C++ | Kommentar hinterlassen

Does testing affect software quality?

Within software development projects you may often heard questions or statements like these:

“We are in trouble with the timelines so we have to skip some of the software tests. How will this affect the software quality?”

“Our software quality is low. Can we add some software tests to increase the quality?”

 

These are typical questions. But I think there is a general misunderstanding between the relationship of software tests and software quality. Maybe you can ask yourself: Is there a relationship between these two attributes and if yes what kind of relationship? As you will see, this question is very difficult to answer.

Quality does not mean testing! There are some managers and even developers who think in this way but to set these two attributes into a strict relationship will not work. Please, don’t understand me wrong. There may be a relationship but to say quality is equal to or only depends on testing is wrong. But what if we weaken the connection between these two attributes a little bit? If we do so, it will become difficult to find a concreate statement whether testing affects quality or not. I know there are many different opinions regarding this question. And of course there is no final answer. So this article will show my opinion and should give you some ideas and thoughts regarding this topic.

 

What is a Feature?

At first I want to have a look on another software attribute beside the quality: the software feature. A feature realization is always a cyclic execution of implementation and testing. To implement a feature, its requirements and therefore the expected use cases will be defined. The implementation itself always contains coding and testing. These two steps will be done in few cycles until the feature is finished. Therefore testing is an integral part of feature implementation. It ensures that the software feature will fit with the specified behavior and can be used by the customer. Without testing the feature will never ever work in all expected use cases.

Of course, software will never be without bugs, but if a feature is realized you should be sure it will work in normal cases. In special cases it may result in errors as not all special cases are tested. The difference between “normal” and “special” cases is: “normal” use cases cover the specified functionality of the feature and “special” use cases will use the feature in a way which is outside of the specification. If I talk about tests I mean the tests to check the expected and specified functionality of the feature. If you skip such tests, the feature will not work in some or many of the expected cases. Therefore if you skip tests you reduce the feature set of your release and not the quality.

 

What is Quality?

There are many definitions for quality. I like the common definitions which are independent of the kind of product and not specific for software development. A simple definition is: “Quality is the grade how requirements are meet”. This means the requirement must be fulfilled and quality adds some additional benefit. So another good definition is: “Quality is the sum of customer benefit beside the functionality of the product.”

What does this mean in terms of software? In software projects we normally have two main types of customers: the end user and the product owner. Sometimes these may be the same persons. In most cases the product owner is the company you are working for and the end users are the persons which buy and use the software.

Quality for the end user is focused on using the software. There may be attributes like: reliability, usability and efficiency. For example the wish “The software must be easy to use” is a typical quality criteria, except the user guidance in the software is specified in a detailed manner and therefore is implemented as feature.

Quality for the product owner is focused on further developing of the software. In this case there are attributes like: changeability, maintenance, flexibility and documentation. So the quality from product owner point of view is mainly focused on the code quality.

 

Does testing affect software quality?

Now, as we found a definition for the term “Quality” we will come back to our initial question: “Does testing affect software quality?” As mentioned before, “standard testing” means check the feature specific use cases. This kind of testing will not increase the quality of the product as it is not focused on quality criteria’s. It is focused on the feature.

But what if we do additional tests? That means, test which are independent of the feature specific standard tests. Can we increase the product quality with such tests?

As said on the beginning of the article: Skip testing does not decrease the software quality it removes features or parts of features. And on the other hand we now think about the possibility to increase the quality with additional test. At first this may sound like a contradiction. But you have to separate the different kinds of testing and look at their goals. The feature related tests are needed to implement the features. The additional tests are done to check things outside of the main functionality. So they are done to check the possible customer benefit beside the functionality of the product. And that exactly matches with our definition of quality.

Such additional testing outside of the normal use cases will therefore increase the quality of the product. Really? The tests itself will help to evaluate the software quality. At next the product owner has to decide whether the software should be changed or not. Therefore the additional tests focused on quality are an entry point for quality changes. So these tests are similar to end user feedbacks and will help to increase the quality.

 

Summary

There are different points of view whether testing is a quality attribute or not. This depends on the kind of test you speak about. If someone speaks about software tests, in most cases the tests to check the intended behavior of a feature are meant. In my opinion such tests are part of the feature development itself. So these tests do not influence the software quality.

As conclusion, in case a product owner wants to reduce testing effort, you have to look carefully on the consequences and explain them in terms of feature loss (most often) and/or quality loss (rarely).

 

Veröffentlicht unter Projektleitung | Kommentar hinterlassen

Linq vs Loop: Join

Like in the previous article of this series I want to compare Linq with a classical loop. This time we want to look at data objects which shall be joined to create a result. Again we want to use validated clean data as input and raw data including null pointers.

Use case one: clean data

Let’s start with the first use case. We have a simple data class for a person and a data class for an address. The data classes are linked together by the AddressIdentifier property.

Out of a list of persons and addresses we want to find a specific person by name. The result shall contain the person name and address. To keep it simple we just look for the first person with the name. If the list does not contain the data we are looking for, we shall return a default person and address.

The data classes are defined as following:

public class Person
{        
    public string Name { get; set; }
    public uint Age { get; set; }
    public uint AddressIdentifier { get; set; }

    public static readonly Person Default = new Person()
    {
        Name = "new person",
        Age = 0,
        AddressIdentifier = 0
    };        
}

public class Address
{
    public uint AddressIdentifier { get; set; }

    public string City { get; set; }

    public static readonly Address Default = new Address()
    {
        AddressIdentifier = 0,
        City = "new city"            
    };
}

 

Our demo console application creates a list of data and calls the data query method, first with an existing person and second with a not existing one.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35, AddressIdentifier = 1 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41, AddressIdentifier = 1 });

List<Address> addresses = new List<Address>();
addresses.Add(new Address() { AddressIdentifier = 1, City = "Chicago" });

//---------

string information;

//search existing person
information = GetPersonInformation(persons, addresses, "Jane Doe");
Console.WriteLine(information);

//search not existing person
information = GetPersonInformation(persons, addresses, "???");
Console.WriteLine(information);

Console.ReadKey();

The data query method shall be implemented twice: by using a loop and by using Linq. We start with the classical loop:

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    Person actualPerson = Person.Default;
    Address actualAddress = Address.Default;

    foreach (Person person in persons)
    {
        if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
        {
            actualPerson = person;
            break;
        }
    }

    foreach (Address address in addresses)
    {
        if (actualPerson.AddressIdentifier == address.AddressIdentifier)
        {
            actualAddress = address;
            break;
        }
    }

    return actualPerson.Name + ", " + actualAddress.City;
}

And we implement the same function by using Linq.

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    var result = from person in persons
                    join address in addresses
                    on person.AddressIdentifier equals address.AddressIdentifier
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select new
                    {
                        Name = person.Name,
                        City = address.City
                    };

    var element = result
        .DefaultIfEmpty(new { Name = Person.Default.Name, City = Address.Default.City })
        .First();

    return element.Name + ", " + element.City;
}

 

Code Review for use case one

The query using the loops is easy to understand and contains clean code. You may think about the possibility to extract the both loops and create single methods to find a person and an address. By doing this refactoring you create three very short and simple methods but with a little increase in complexity. Therefore in my opinion a single method with both loops is fine too.

The Linq query is easy to understand too. You have to know some details about Linq, for example the need for the DefaultIfEmpty statement may not be clear in the first moment. Therefore it would be helpful to add some comments to the query to explain why some statements are needed.

I don’t favor any of the two implementations. From my point of view they are coequal.

Use case two: dirty data

The second use case adds an important need: the query must be robust. So the data may for example contain null values. Like in the first use case the method shall return default data if the person we looking for is not found. Null values or not initialized list shall not throw an error. In this case also the default data shall be returned.

In our test console application we create some dirty data. And we add additional tests to call the function with the data or even with null parameters.

List<Person> persons = new List<Person>();

persons.Add(new Person() { Name = "John Doe", Age = 35, AddressIdentifier = 1 });
persons.Add(null);
persons.Add(new Person() { Name = null, Age = 38, AddressIdentifier = 2 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41, AddressIdentifier = 3 });
persons.Add(new Person() { Name = "Jane Foe", Age = 41, AddressIdentifier = 4 });

List<Address> addresses = new List<Address>();
addresses.Add(new Address() { AddressIdentifier = 1, City = "Chicago" });
addresses.Add(new Address() { AddressIdentifier = 2, City = null });
addresses.Add(null);
addresses.Add(new Address() { AddressIdentifier = 3, City = "Chicago" });            

//---------

string information;

//search existing person
information = GetPersonInformation(persons, addresses, "Jane Doe");
Console.WriteLine(information);

information = GetPersonInformation(persons, addresses, "Jane Foe");
Console.WriteLine(information);

//search not existing person
information = GetPersonInformation(persons, addresses, "???");
Console.WriteLine(information);

//search in a list which is not yet initialized
information = GetPersonInformation(null, addresses, "???");
Console.WriteLine(information);

information = GetPersonInformation(persons, null, "???");
Console.WriteLine(information);

information = GetPersonInformation(null, null, "???");
Console.WriteLine(information);

Console.ReadKey();  

The implemented query using the loop must be adapted to handle all these special cases. The following source code shows an according implementation. The list and the list content will be checked for null values.

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    Person actualPerson = Person.Default;
    Address actualAddress = Address.Default;

    if (persons != null)
    {
        foreach (Person person in persons)
        {
            if (person == null)
            {
                continue;
            }

            if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
            {
                actualPerson = person;
                break;
            }
        }
    }

    if (addresses != null)
    {
        foreach (Address address in addresses)
        {
            if (address == null)
            {
                continue;
            }

            if (actualPerson.AddressIdentifier == address.AddressIdentifier)
            {
                actualAddress = address;
                break;
            }
        }
    }

    return actualPerson.Name + ", " + actualAddress.City;
}

 

The implementation of the Linq query must be adapted too. A check of the whole list, as well of the single element is added.

static private string GetPersonInformation(List<Person> persons, List<Address> addresses, string name)
{
    if(persons == null)
    {
        persons = new List<Person>();
    }

    if(addresses == null)
    {
        addresses = new List<Address>();
    }

    var result = from person in persons.Where(p => p != null)
                    join address in addresses.Where(a => a != null) 
                    on person.AddressIdentifier equals address.AddressIdentifier                         
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select new
                    {
                        Name = person.Name,
                        City = address.City
                    };

    var element = result
        .DefaultIfEmpty(new { Name = Person.Default.Name, City = Address.Default.City })
        .First();

    return element.Name + ", " + element.City;
}      

Code Review for use case two

The method containing the loops gets more complex with all the if-statements. Therefore you should extract the two loops and create single methods looking for a person and an address. By doing this little refactoring the loop implementation will become very easy to understand.

The Linq implementation was not changed much. Before execution of the query some data checks are done. But there is a little detail, the additional small queries within the in-parts. These queries are needed to remove null objects. I think you have some possibilities to refactor this implementation. You may extract the nested queries or the data checks. Or in case you want to leave the complex query as it is, you should add comments to explain it a little bit.

Without refactoring I don’t like any of these two implementations as they are some kind of complex. I would like to have two separate query methos, one for the person and the other one for the address and an additional managing method which calls these two query methods and joins the result. The single query methods as well as the join can be implemented with simple Linq statements.

Veröffentlicht unter .NET, C#, LINQ | Kommentar hinterlassen

RValue Reference Declarator: &&

The rvalue reference is a nice c++ feature to create efficient source code. Within this article i want to explain what is meant with an rvalue and how you can use the reference declarator. Furthermore you will learn how to use this feature to implement a move constructor and functions which moves parameters.

 

LValue and RValue

In the earliest days of C the lvalue was defined as an expression that may appear on the left or on the right hand side of an assignment, whereas an rvalue is an expression that can only appear on the right hand side of an assignment. For example within the assignment “int a = 21;” the expression “a” is a lvalue and “21” is a rvalue. Of course the lvalue “a” may also be placed on the right hand side of an assignment. For example in assignment “int b = a” both expressions are lvalues.

In C++ this definition is still useful as a first intuitive approach. But we can add another point of view: Lvalues are named objects that persist beyond a single expression and rvalues are unnamed temporaries that evaporate at the end of the expression.

The following example will show some lvalues and rvalues.

int _tmain(int argc, _TCHAR* argv[])
{
  int x;
  int y;
  
  // lvalues
  std::string a;
  std::string* b = &a;
  ++x;

  // rvalues
  123;
  x + y;
  std::string("rvalue");
  x++;

  return 0;
}

 

The first two assignments are very intuitive. The expressions create the named objects “a” and “b” which are lvalues. The first three rvalue examples are easy to understand too. “123” and the result of “x+y” are unnamed objects which are no longer accessible after the end of the expression. The same is true for the created string “rvalue”. An string object is created but it is not named and not accessible after the expression. But you may be astonished by locking at the increment operator. Why is “++x” an lvalue and “x++” an rvalue?

The expression “++x” is an lvalue because it modifies and then names the persistent object. “x++” instead will create a copy of the persistent object, increases the object value and returns the copy. Therefore the expression “x++” will return an unnamed not persistent object, an rvalue.

This little example shows a very important aspect about the difference of lvalues and rvalues. It is not about what an expression does, it is about what an expression names:  something persistent or something not persistent which only exists temporary within the expression. And as persistent objects can be addressed you can also say: If you can address an expression it is an lvalue and if you cannot it is an rvalue. For example “&++x” is valid whereas “&x++” is not.

In summary you may name the following definition: “An lvalue is an expression that refers to a memory location and allows us to take the address of that memory location via the & operator. An rvalue is an expression that is not an lvalue.”

 

RValue reference

A reference to an rvalue can be created by using the double address operator &&. Similar to standard references, or to me more precise lvalue references, you can create rvalue references. This may be used to pass rvalues to functions. Later on we will see how ravlue references will allow us to write a move constructor.

As an rvalue is a temporary object which is only valid in small scope, for example in one expression only, an rvalue reference is a reference to an argument that is about to be destroyed.

 

Constructor performance

Before we start to think about optimization of constructors by using rvalue references we want to have a look at the main issue of standard copy constructors.

To understand the issue we can think about the following: Let’s say you get a folder with a sheet of paper. You shall create a second folder and copy the sheet of paper. What will you do if you get the additional information that the first folder is never used anymore and thrown away? With this information you don’t have to photocopy the sheet of paper. You just have to move the original one to the new folder.

Exactly the same issue can be found in source code. By locking at above example we can now say: The most unnecessary copies are those where the source is about to be destroyed.

With this idea in mind, look at the following line of code, where s1 to s3 are strings.

string x = s1 + “ ” + s2 + “ ” + s3;

By executing this line of code a lot of temporary strings are created and therefore a lot of copy operations are done. Of course this expression will be executed in microseconds and may not need an optimization. But the example shows the general concept which is also valid for large and complex objects. If we come back to the idea we had above, we know that we don’t have to create a copy if the source is about to be destroyed. What does this mean for the string concatenation example? At the start of the expression we concatenate s1 with a blank (s1 + “ ”). In this case it is necessary to create a new temporary string because s1 is an lvalue naming a persistent object. Therefore a copy of its content has to be created. But in the next step we add s2 to this new temporary string created by s1 + “ ”. A second temporary string can be created and a copy of the first one can be concatenated with s2. Afterwards we throw the first temporary string away. And that’s the issue. We create the photocopy of the sheet of papers and throw the origin away. As the first temporary string, which was the result of s1 + “ “, is a rvalue referring to a temporary object we can move the origin content and don’t have to create a copy. This is the key concept of move semantics.

Before we go forward and looking at move constructors someone may think: As we can create rvalue references we have access to the temporary objects. What if we use these references to access the objects later on?

This is a good question. C++ is a language where the developer should have a maximum flexibility. Therefore the language itself will not forbid doing such wrong implementations. If we go back to the initial examples: Your boss told you that the origin folder with the sheet of paper is no longer needed and he will through it away after you have created the new folder. What if he has changed his mind and will use the origin folder later on? This will not work as he now has an empty folder. The same will happen if you access rvalues after their scope. They may contain invalid content or maybe they have pointers to memory locations already used by other objects. Therefore using rvalues after their scope is an implementation issue and may result in critical errors.

 

Move constructors

The standard copy constructors may help to reduce the issue of unnecessary copies but they cannot remove all of them. Move constructors which use rvalue references can help you improve the performance of your applications by eliminating the need for unnecessary memory allocations and copy operations. In general, being able to detect modifiable rvalues allows you to optimize ressorce handling.  If the objects referred to by modifiable rvalues own any resources, you can steal their resources instead of copying them, since they’re going to evaporate anyways.

The following example shows a typical move constructor. The parameter is an rvalue reference to the class. Inside the move constructor you will move the resources from the source object to the new object and you should release the data reference of the origin object to prevent the destructor from releasing them multiple times. As the new object has taken over the resources, the new object is responsible to release them.

class MyClass
{
public:
  MyClass(MyClass&& source) : mData(nullptr)
  {
    // move data
    mData = source.mData;

    // release source data
    // so the destructor does not free the memory multiple times
    source.mData = nullptr;
  }
private:
  std::vector<int>* mData;
};

 

To understand the behavior of the move constructor we want to look at a second example. We will now implement the example from above with the folder containing a sheet of paper. So we implement a folder class containing a vector with strings. In case of the move constructor we want to move the resources from one object to the other. Furthermore I have implemented a second constructor which gets the vector as input parameter. This initialization constructor will also use an rvalue reference to the source data.

class Folder
{
public:
  Folder(){};

  Folder(Folder&& source)
    : mData(std::move(source.mData))
  {    
  }

  Folder(std::vector<std::string>&& data)
    : mData(std::move(data))
  {    
  }

  void ShowSize()
  {
    std::cout << mData.size() << std::endl;
  };

private:
  std::vector<std::string> mData;
};


int _tmain(int argc, _TCHAR* argv[])
{    
  std::vector<std::string> data;
  data.push_back("abc");

  Folder original(std::move(data));
  Folder copy(std::move(original));

  std::cout << data.size() << std::endl;
  original.ShowSize();
  copy.ShowSize();
  
	return 0;
}

 

If we execute the application, the output shows us the size of the different vectors. The initial vector and the one inside the first folder are empty and only the new folder contains the resources. That’s because the move constructor of the vector will move the resources and resets the origin ones.

Within the initializer list of the constructors and on calling the constructors you will find a new function not explained so far: std::move. So we will proceed to look at this functions.

 

std::move

The function std::move enables you to create the rvalue reference to an existing object. Alternatively, you can use the static_cast keyword to cast an lvalue to an rvalue reference: static_cast(mySourceObject);

But why had we use this function? Let us start with the constructor call from above example: Folder copy(std::move(original));

The original object we want to copy is an lvalue. Therefore if we use this object as parameter, the standard copy constructor is called. By using the move function we get the rvalue reference to the object and can pass it to the move constructor. Within the constructor we initialize the vector. Here we have to follow the same principle. If we pass the vector it is an lvalue and a copy is created. But if we convert it to an rvalue reference we can call the move constructor of the vector.

 

Summary

Understanding the concept of rvalues and rvalue references will allow you to create and use move constructors. These move constructors can help you improve the performance of your applications by eliminating the need for unnecessary memory allocations and copy operations.

Veröffentlicht unter C++ | Kommentar hinterlassen

Linq vs Loop: Nested Loop

Like in the previous article of this series I want to compare Linq with a classical loop. This time we want to look at the use case to handle data which contains nested data. Again we want to use validated clean data as input and raw data including null pointers.

Use case one: clean data
Let’s start with the first use case. We have a simple data class for a person and a data class for a person group. The person group contains a list of person. So we can created the data structure with nested data. Out of a list of persons we want to find a specific one by name. To keep it simple we just look for the first person with the name. If the list does not contain the data we are looking for, we shall return a default person object.

The data classes are defined as following:

public class Person
{
    public string Name { get; set; }
    public uint Age { get; set; }

    public static readonly Person Default = new Person()
    {
        Name = "new person",
        Age = 0
    };
}

public class PersonGroup
{
    public string GroupName { get; set; }
    public List<Person> Persons { get; set; }
}

 

Our demo console application creates a list of data and calls the data query method, first with an existing person and second with a not existing one.

List<Person> persons;
List<PersonGroup> groups = new List<PersonGroup>();

persons = new List<Person>();
persons.Add(new Person() { Name = "John Doe", Age = 35 });
persons.Add(new Person() { Name = "John Foe", Age = 47 });
groups.Add(new PersonGroup() { GroupName = "male", Persons = persons });

persons = new List<Person>();
persons.Add(new Person() { Name = "Jane Doe", Age = 41 });
groups.Add(new PersonGroup() { GroupName = "female", Persons = persons });

//---------

Person person;

//search existing person
person = FindPerson(groups, "Jane Doe");
Console.WriteLine("Name: " + person.Name);

//search not existing person
person = FindPerson(groups, "???");
Console.WriteLine("Name: " + person.Name);

Console.ReadKey();

The data query method shall be implemented twice: by using a loop and by using Linq. We start with the classical loop:

static private Person FindPerson(List<PersonGroup> groups, string name)
{
    foreach (PersonGroup group in groups)
    {
        foreach (Person person in group.Persons)
        {
            if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
            {
                return person;
            }
        }
    }

    return Person.Default;
}

And we implement the same function by using Linq.

static private Person FindPerson(List<PersonGroup> groups, string name)
{
    var result = from personGroup in groups
                    from person in personGroup.Persons
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select person;

    return result
        .DefaultIfEmpty<Person>(Person.Default)
        .First<Person>();
}

Code Review for use case one

The loop method is implemented with a simple nested loop containing the data comparison. The source code is clean and easy to understand. The same statement can be given for the Linq implementation. The only difficult part is the “DefaultIfEmpty” statement. In case the developer adds a little comment why the “DefaultIsEmpty” call is done, the Linq query is easy to understand too and I don’t prefer any of the two implementations.

Use case two: dirty data

The second use case adds an important need: the query must be robust. So the data may for example contain null values. Like in the first use case the method shall return a default person if the one we looking for is not found. Null values or not initialized list shall not throw an error. In this case also the default person shall be returned.

In our test console application we create some dirty data. And we add additional tests to call the function with the data or even with null parameters.

List<Person> persons;
List<PersonGroup> groups = new List<PersonGroup>();

persons = new List<Person>();
persons.Add(new Person() { Name = "John Doe", Age = 35 });
persons.Add(new Person() { Name = "John Foe", Age = 47 });
groups.Add(new PersonGroup() { GroupName = "male", Persons = persons });

groups.Add(null);
groups.Add(new PersonGroup() { GroupName = "female", Persons = null });

persons = new List<Person>();
persons.Add(null);
persons.Add(new Person() { Name = null, Age = 41 });
persons.Add(new Person() { Name = "Jane Doe", Age = 41 });
groups.Add(new PersonGroup() { GroupName = "female", Persons = persons });

//---------
Person person;

//search existing person
person = FindPerson(groups, "Jane Doe");
Console.WriteLine("Name: " + person.Name);

//search not existing person
person = FindPerson(groups, "???");
Console.WriteLine("Name: " + person.Name);

//search in a list which is not yet initialized
person = FindPerson(null, "???");
Console.WriteLine("Name: " + person.Name);

Console.ReadKey();

The implemented query using the loop must be adapted to handle all these special cases. The following source code shows an according implementation. The list and the list content will be checked for null values.

static private Person FindPerson(List<PersonGroup> groups, string name)
{
    if (groups == null)
    {
        return Person.Default;
    }

    foreach (PersonGroup group in groups)
    {
        if ((group == null) ||
            (group.Persons == null))
        {
            continue;
        }

        foreach (Person person in group.Persons)
        {
            if (person == null)
            {
                continue;
            }

            if (string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase))
            {
                return person;
            }
        }
    }

    return Person.Default;
}

The implementation of the Linq query must be adapted too. A check of the whole list, as well of the single element is added.

static private Person FindPerson(List<PersonGroup> groups, string name)
{
    if (groups == null)
    {
        return Person.Default;
    }

    var result = from personGroup in groups
                    where personGroup != null
                    where personGroup.Persons != null
                    from person in personGroup.Persons
                    where person != null
                    where string.Equals(person.Name, name, StringComparison.OrdinalIgnoreCase)
                    select person;

    return result
        .DefaultIfEmpty<Person>(Person.Default)
        .First<Person>();
}

Code Review for use case two

The nested loop gets more complex as the different special cases must be handled. This adds the need for additional if-statements. In such a case you may think about the possibility to extract the inner loop and move it to an own method. By using such an additional method the code is very easy to understand even with the additional if-statements.

The Linq implementation is done by using a query containing an inner query. To handle all the special cases with not initialized data, some additional where-statements were added. This will expand the query a little bit but it stays understandable.

In this case I like the Linq implementation a little bit more compared to the nested loop. But on the other hand the Linq query has a major disadvantage: find an error is difficult and time consuming. You may try this by removing one or several of the where-statements looking for invalid data.

Veröffentlicht unter .NET, C#, Clean Code, LINQ | Kommentar hinterlassen

Design patterns: Composite

The composite design pattern is used to compose objects into a tree structure. So each element can hold a list of sub elements. Optional it may have a link to the parent object within the tree.

Within the following example I want to create a generic base class to implement the behavior of a tree node. This base class can be used for any object to create a tree structure.

At first we define the interface for the tree node.

public interface ITreeNode<T>
{
    T AddChild(ITreeNode<T> child);
    void RemoveChild(ITreeNode<T> child);

    List<ITreeNode<T>> Children { get; }
}

Next we implement the tree node object.

public class TreeNode<T> : ITreeNode<T>
{
    private List<ITreeNode<T>> _children = new List<ITreeNode<T>>();

    public List<ITreeNode<T>> Children
    {
        get
        {
            return _children;
        }
    }

    public T AddChild(ITreeNode<T> child)
    {
        _children.Add(child);

        return (T)child;
    }

    public void RemoveChild(ITreeNode<T> child)
    {
        _children.Remove(child);
    }
}

Now let’s say you want to create a tree of visual elements. To keep it simple our visual element is a shape which has a name property only. By using the generic tree node you can simply create a shape tree node.

public class Shape : TreeNode<Shape>
{
    public Shape(string name)
    {
        Name = name;
    }

    public string Name { get; set; }
}

Finally we will create a console application to test our nice shape object. Within this application we create a complex shape containing a tree structure of sub shapes. And we create a recursive executed method to show our visual tree.

static void Main(string[] args)
{
    Shape root = new Shape("Drawing");

    root.AddChild(new Shape("Circle"));
    Shape rectangle = root.AddChild(new Shape("Rectangle"));
    root.AddChild(new Shape("Line"));

    rectangle.AddChild(new Shape("Dotted Line"));
    rectangle.AddChild(new Shape("Triangle"));

    Display(root);

    Console.ReadKey();
}

private static void Display(Shape rootShape)
{
    Display(rootShape, 1);
}

private static void Display(Shape shape, int level)
{
    if (level > 1)
    {
        Console.Write(new string(' ', (level - 2) * 2));
        Console.Write("> ");
    }
    Console.WriteLine(shape.Name);

    foreach (Shape child in shape.Children)
    {
        Display(child, level + 1);
    }
}
Veröffentlicht unter .NET, C#, Design Pattern | Kommentar hinterlassen