Pattern Matching in C# 7

Patterns are used to test whether a value matches a specific expectation and if it matches patterns allow to extract information from the value. You already create such pattern matchings by writing if and switch statements. With these statements you test values and if they match the expectation you extract and use the values information.

With C# 7 we got an extension to the syntax for is and case statements. This syntax extension allows combine the two steps: testing a value and extract its information.

Introduction

Let’s start with a basic example to see what we are talking about. The following source code shows how to test whether a value is of specific type and then use the value for a console output. The code shows the old and new syntax so you can compare these two implementations. As you can see the new syntax combines the value testing and information extraction in one short statement.

static void Main(string[] args)
{
  WriteValueCS7("abc");
  WriteValueCS6(15);
  WriteValueCS7(18.4);
}

static void WriteValueCS7(dynamic x)
{
  //C# 7
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

static void WriteValueCS6(dynamic x)
{
  //C# 6
  if (x is int)
  {
    var i = (int)x;
    Console.WriteLine("integer: " + i);
  }
  else if (x is string)
  {
    var s = x as string;
    Console.WriteLine("string: " + s);
  }
  else
  {
    Console.WriteLine("not supported type");
  }
}

The example shows pattern matching used in an is-expression to do a type check. The new pattern matching syntax is furthermore supported in case-expressions and it allows three different type of patterns: the type pattern, the const pattern and the var pattern. We will see these different possibilities within the next paragraphs.

Type Pattern

We have already seen the type pattern matching within the previous example. It is used to check whether a value is of a specific type. If the type is matching a new variable of this type is created and can be used to extract the value information. If a value is null, the type check always returns false. The following source code shows an according example.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");
}

Const Pattern

The pattern matching can be used to check whether the value matches a constant. Within this pattern you cannot create a new variable with the value information as the value already matches a constant and can be used as it is.

static void Main(string[] args)
{
string a = "abc";
string b = null;
int c = 15;
int d = 17;

WriteValue(a);  // output: 'const: abc'
WriteValue(b);  // output: 'const: null'
WriteValue(c);  // output: 'const: 15'
WriteValue(d);  // output: 'unknown'
}

static void WriteValue(dynamic x)
{
if (x is 15) Console.WriteLine("const: 15");
else if (x is "abc") Console.WriteLine("const: abc");
else if (x is null) Console.WriteLine("const: null");
else Console.WriteLine("unknown");
}

Var Pattern

The var pattern is a special case of the type pattern with one major distinction: the pattern will match any value, even if the value is null. Following we see the example previously used for the type pattern, extended with the var pattern.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if (x is var v) Console.WriteLine("not supported type");
}

If we look at this example we may ask two critical questions: Why do we have to specify a temporary variable for the var pattern if we dont use it? And why do we use the var pattern at all is it is the same as the empty (default) else-statement?

The first question is easy to answer. If we use the var pattern and don’t need the target variable we can use the discard wildcard „_“ which was also introduced with C# 7.

The second question is more difficult. As described, the var pattern always matches. So, it represents a default case, which is the empty else in an if-else statement. Therefore, if we just want to write the default else-case we should not use the var pattern at all. But the var pattern proves to be practical as we want to distinguish between different groups of default-cases. The following code shows an according example. It uses more than one var-pattern to handle the default-case in more detail. As mentioned above the last var pattern is unnecessary and you can write an empty else. I used the var pattern anyway to show you how to use the discard character.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else if ((x is var v) && (v == null)) Console.WriteLine("'null' is not supported");
  else if ((x is var o) && (o.GetType().IsPrimitive)) Console.WriteLine("not supported primitive type");
  else if (x is var _) Console.WriteLine("not supported type");
}

Switch-case

At the beginning of the article I mentioned that pattern matching can be used in if-statements and switch-statements. Now we know the three types of pattern matching and have used them in if-statements. At next we will see how to use the patterns in switch-statements.

The switch-statement so far was a pattern expression. it supported the const pattern only and was limited to numeric types and the string type. With C# 7 those restrictions have been removed. Now the switch-statement supports pattern matching and therefore all three patterns can be used. Furthermore, a variable of any type may be used in a switch statement.

The new possibilities have an side-effect which made it necessary to change the behavior of the switch-case-statement. So far, the switch statement supported const pattern only and therefore the case-clauses were unique. With the new pattern matching the case-clauses can overlap and may not be unique anymore. Therefore, the order of the case-clauses matters. For example, the compiler emits an error if the previous clause matches a base type and the next clause matches a derived type. Because of the possible overlapping case-clauses, each case must end with a break or return. This prevents code execution to „fall through“ from one case expression to the next.

The following example shows the type pattern used in an switch-case-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Switch-case with predicates

Another feature related to pattern matching is the ability to use predicates within the switch-case-statement. Within a case-clause a when-clause can be used to do more specific checks.

The following source code shows the use case we already seen in the var pattern example. But this time we use the switch-case and where statements instead of the if-statement.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;
  double d = 17.5;
  Guid e = Guid.NewGuid();

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: ''null' is not supported'
  WriteValue(c);  // output: 'integer: 15'
  WriteValue(d);  // output: 'not supported primitive type'
  WriteValue(e);  // output: 'not supported type'
}

static void WriteValue(dynamic x)
{
  switch (x)
  {
    case int i: Console.WriteLine("integer: " + i); break;
    case string s: Console.WriteLine("string: " + s); break;
    case var v when v == null: Console.WriteLine("'null' is not supported"); break;
    case var o when o.GetType().IsPrimitive: Console.WriteLine("not supported primitive type"); break;
    default: Console.WriteLine("not supported type"); break;
  }
}

Scope of pattern variables

A variable introduced within a type pattern or var pattern in an if-statement is lifted to the outer scope. This leads to strange behavior of the compiler. On the one hand it is not meaningful to use the variable outside the if-statement because it may not be initialized. And on the other hand, the compiler behavior is different for an if-statement and an else-if statement. But maybe this strange behavior will be fixed in a next compiler version. The following source code shows an according example with the compiler errors as comments.

static void Main(string[] args)
{
  string a = "abc";
  string b = null;
  int c = 15;

  WriteValue(a);  // output: 'string: abc'
  WriteValue(b);  // output: 'not supported type'
  WriteValue(c);  // output: 'integer: 15'
}

static void WriteValue(dynamic x)
{
  if (x is int i) Console.WriteLine("integer: " + i);
  else if (x is string s) Console.WriteLine("string: " + s);
  else Console.WriteLine("not supported type");

  Console.WriteLine(i); // error: Use of unassigned local variable 'i'
  i = 15; // ok      

  // s = "abc";  // error: The name 's' does not exist in the current context
  string s = "abc"; // error: 's' cannot be declared in this scope because that name is used in a local or parameter
}

Pattern variables created inside a case-clause are only valid within the case-clause. They are not lifted outside the switch-case scope. In my opinion this leads to a clean separation of concerns and it would be nice to have the same behavior in if-statements.

Summary

Pattern matching is a powerful concept. The pattern matching possibilities introduced with C# 7 offer nice ways to write complex if-statements and switch-statements in a clean way. The patterns introduced so far are just some base ones and with C# 8 it is planned to add some more advanced ones like recursive pattern, positional pattern and property pattern. So, this programming concept is not just syntactical sugar, it will become an important concept in C# and introduces more and more functional programming techniques to the language.

Werbeanzeigen
Veröffentlicht unter .NET, C# | Kommentar hinterlassen

C++17: initializers in if-statement and switch-statement

With C++17 it is possible to initialize a variable inside an if-statement and a switch-statement. We already know and use this concept in the for-statement. To be honest: I don’t like this feature. Within this article I want to introduce this feature and explain my doubts. Following I will write about the if-statement only because everything also applies to the switch-statement and so it is sufficient to show one of both.

The new syntax with the initializer inside the if-statement comes with a big improvement: the variable is moved inside the scope of the if-block. An important software design concept is to use the smallest scope as possible and the new syntax helps to implement according to this design concept.

But you must pay dearly for this advantage. As the initialization moves into the if-statement, initialization and comparison will be mixed up. This violates two other software design concepts, named “separation of concerns” and “keep it simple”. Depending on the complexity of the initialization and the comparison you may create a very complex if-statement. This may result in hard to read and error prone code. Only in case you have a very simple initialization and a very simple comparison, the combination of both may stay simple as well. In all other cases I recommend avoiding the new feature and clearly separate the initialization and the comparison in order to increase the code readability.

Let’s have a look at a simple example. The following source code shows an if-statement with an included variable initialization and the same if-statement with a separation of the initialization and the comparison. Furthermore, just for fun, I removed the line breaks for the second example to compare it with the new syntax.

// init inside if
if (int count = CalcCount(); count > 100)
{
	std::cout << "count: " << count < 100)
{
	std::cout << "count: " << count < 100)
{
	std::cout << "count: " << count << std::endl;
}

If we compare the first and the second implementation – so if we compare new and classical syntax – we could say that the differences are small. In my opinion both variants are easy to read. Even the third one, with classical syntax but without line breaks, may be easy to read, even if it is unusual. But you can see that the new syntax isn’t that different from the classical without line break. Just the if-statement moved at the front. Of course, this little change increased the readability a lot.

So, we must look at a more complex example. Let’s see how things look like if we increase the complexity of the initialization, but leave the comparison as simple as before.

// init inside if
if (int count = IsInitialized() ? CalcCount() : (CalcExpectedCount() + CalcOldCount()) / 2; count > 100)
{
	std::cout << "count: " << count < 100)
{
	std::cout << "count: " << count < 100)
{
	std::cout << "count: " << count << std::endl;
}

At first you can see the new syntax. In my opinion this if-statement is very hard to read. You have to stop at this line of code, read it several times and look closely to understand the meaning of the code.

The second implementation separates the initialization and the comparison. I think this will make it a little bit easier to read the code.

The third example clearly separates the different concerns. We have a standard initialization, an initialization for fallback cases and a comparison. This source code is easy to read. You don’t have to stop reading at any line of code as you must read it again to understand it. The complex initialization and comparison is spitted into simple parts.

Summary

Initializers in if-statements and switch-statements allow a clear assignment of the variable to the scope of the statement. But mixing the two concerns of initialization and comparison will often result in complex code. Therefore, in my opinion, the new syntax should be used with caution. If the initialization as well as the comparison is short and simple the resulting combination of both may be simple too and in this case the new syntax should be used.

Veröffentlicht unter C++ | Kommentar hinterlassen

Name hiding in inheritance

The C++ name hiding rules for variables are well known by software developers. In contrast, name hiding in inheritance sometimes leads to issues although it follows the same rules. Such issues are therefore not a result of the rules but an effect due to different expectations in these different programming scenarios.

Name hiding rules for variables

Let’s start with the well-known rules for variables. The following example shows a typical scenario.

double x;
	
int main()
{
	int x;

	x = 5.5;	// conversion from double to int

	std::cout << x << std::endl;	// prints 5

	::x = 8.8;	// changes the global x

	std::cout << x << std::endl;	// prints 5
	
	return 0;
}

Within this example you see a double variable within the global scope and an integer variable within the local scope. As they have the same names, the local variable hides the global one, even if they have different types. If we want to access the double variable we must use the global namespace explicitly.

Name hiding rules in inheritance

At next, let us try the same name hiding within an inheritance scenario.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc was called" << std::endl; }
};

class Derived : public Base
{
public:
	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// derived is called, conversion of double to int
	d.Calc(8);		// derived is called

	return 0;
}

Within this example we have a function with a double parameter in the base class and a function with an integer parameter in the derived class. The same name hiding rules like in the example before are still used. The local function of the derived class will hide the global function of the base class even if the function parameters are different.

Some developers are surprised by this behavior as they expect that the public functions of the base class will become functions of the derived class too and a function call with respect to the function parameters types can be executed. That’s a valid and comprehensible expectation because from a software architectural point of view public inheritance represents a “is-a” relationship.

From technical point of view the different kinds of inheritance just result in different visibilities of interfaces. Therefore, name hiding should be seen with respect to this technical point of view. So, the behavior of the example application is correct.

Make hidden names visible

Of course, there are many use cases where you want to keep the hidden names visible. For example, in public inheritance scenarios you normally want to have the interface visible and it should be a rare case to keep it hidden. As expected, this behavior was respected for C++. With the “using” statement the hidden names will become visible again. The following example shows the according modification of the derived class. This time the base class method is called if we pass a double parameter.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc was called" << std::endl; }
};

class Derived : public Base
{
public:
	using Base::Calc;	// make base class function name visible in derived class

	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// base is called
	d.Calc(8);		// derived is called

	return 0;
}

Make specific hidden name visible

Within the previous example we have seen the “using” statement to make hidden names visible. As we already learned, names are independent of data types. Therefore, if we have different functions with the same name implemented in the base class, these functions will become visible. The following source code shows an according example. Furthermore, I have changed to private inheritance to show that the concept is independent of the inheritance kind.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc double was called" << std::endl; }

	void Calc(std::string x) { std::cout << "Base calc string was called" << std::endl; }
};

class Derived : private Base
{
public:
	using Base::Calc;	// make base class function name visible in derived class

	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// base is called
	d.Calc(8);		// derived is called

	d.Calc("abc");

	return 0;
}

Based on a technical point of view the example looks fine. But from a software architectural point of view you may argue that it is bad design if we make the private interface public in derived class. And you are totally right. But sometimes such design decisions are made for several reasons. But in this case you want to keep the design fault as small as possible and make only one or a few of the available functions public. You can do this by using forward declaration instead of the “using” declaration. The following source code shows the adapted example.

class Base
{
public:
	void Calc(double x) { std::cout << "Base calc double was called" << std::endl; }

	void Calc(std::string x) { std::cout << "Base calc string was called" << std::endl; }
};

class Derived : private Base
{
public:
	void Calc(double x) { Base::Calc(x); };

	void Calc(int x) { std::cout << "Derived calc was called" << std::endl; }
};

int main()
{
	Derived d;

	d.Calc(3.5);	// base is called
	d.Calc(8);		// derived is called

	d.Calc("abc");	// compiler error

	return 0;
}

Summary

Names in derived classes hide names of base classes. This behavior is correct from a technical point of view. But in case of public inheritance it contradicts our expectations from a software architectural point of view. But we can easily make the hidden names visible again with the “using” declaration or with forward declarations.

Veröffentlicht unter C++ | Kommentar hinterlassen

Expression Bodied Members in C# 7

The concept of Expression Bodied Members (EBM) was introduced with C# 6 and as it becomes popular, many enhancements were added with C# 7. Within this article I want to give you the full picture of this feature so I explain the C# 6 and C# 7 EBM features.

With C# 6 the following EBM were introduced:

  • Expression bodied Methods
  • Expression bodied Properties

With C# 7 the following EBM were added:

  • Expression bodied Property Getter
  • Expression bodied Property Setter
  • Expression bodied Indexer
  • Expression bodied Operators Overloading
  • Expression bodied Constructor
  • Expression bodied Destructor (Finalizer)

Syntax

Member methods as well as property getters and property setters are sometimes implemented with a single instruction. In such cases the syntax overhead like brackets or getter and setter syntax is larger than the syntax for the real functionality. EBM allow to reduce the syntax overhead and therefore bring back the focus on the real functionality. This will increase the readability of the source code. For EBM a syntax is used which is already known from lambda expression: the “=>” sign. In contrast to lambda expressions you are limited to a single instruction. I think that’s a really good design decision because the EBM syntax only makes sense in such special cases. If your class members like constructor, property getters or methods contain several instructions, the bracket syntax used so far is more suitable. The instruction which belong together will be written into one block and therefore you can easily see that they belong together. But if you have one instruction only there is nothing which must be grouped. So, in this case it makes sense to leave out the block syntax and use a more lightweight style.

Examples

The following paragraphs show examples for each EBM type. As the examples are quite easily and self-explaining you will not find further descriptions or explanations. But at the end of the article you will find a summary and my personally thinking about the EBM feature.

Each example contains the same implementation twice: one time in EBM syntax and one time in standard block syntax. This allows an easy comparison of both implementation styles. But of course, if you want to compile the source code you must comment out one of the two implementations.

Expression bodied Methods

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  int result = myClass.Sum(3, 5);

  Console.WriteLine(result);
}

class MyClass
{
  // C# 6
  public int Sum(int a, int b) => a + b;

  // C# 5
  public int Sum(int a, int b)
  {
    return a + b;
  }
}

Expression bodied Properties

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  Console.WriteLine(myClass.Value);
}

class MyClass
{
  // C# 6
  public int Value => mValue;

  // C# 5
  public int Value
  {
    get { return mValue; }
  }

  private int mValue = 12;
}

Expression bodied Property Getter and Setter

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  myClass.Value = 42;
  Console.WriteLine(myClass.Value);
}

class MyClass
{
  // C# 7
  public int Value
  {
    get => mValue;
    set => mValue = value;
  }

  // C# 6
  public int Value
  {
    get { return mValue; }
    set { mValue = value; }
  }

  private int mValue = 12;
}

Expression bodied Indexer

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  Console.WriteLine(myClass[2]);
}

class MyClass
{
  // C# 7
  public int this[int index] => mValues[index];

  // C# 6
  public int this[int index]
  {
    get { return mValues[index]; }
  }

  private int[] mValues = new int[] { 11, 12, 13, 14 };
}

Expression bodied Operators Overloading

static void Main(string[] args)
{
  MyClass myClass = new MyClass();

  myClass.Value = 15;
  myClass++;

  Console.WriteLine(myClass.Value);
}

class MyClass
{
  // C# 7
  public static MyClass operator ++(MyClass myClass) => new MyClass() { Value = myClass.Value + 1 };

  // C# 6
  public static MyClass operator ++(MyClass myClass)
  {
    return new MyClass() { Value = myClass.Value + 1 };
  }

  public int Value { get; set; }
}

Expression bodied Constructor and Destructor (Finalizer)

static void Main(string[] args)
{
  MyClass myClass = new MyClass();
}

class MyClass
{
  // C# 7
  public MyClass() => Init();
  ~MyClass() => CleanUp();

  // C# 6
  public MyClass() { Init(); }
  ~MyClass() { CleanUp(); }

  private void Init() { }
  private void CleanUp() { }
}

Summary and Assessment

As you can see within the examples, the source code will become more readable as unnecessary syntax overhead is removed. From my point of view EBM is a quite nice feature. But of course, you should not overuse it. EBM should only be used if you have use a simple instruction. Furthermore, I don’t like to use EBM in ctor or finalizer in cases where you want to manage a single resource only. This feels a little bit inappropriate and most often you have more than one resource within a class. If you just want to call a single method in ctor or finalizer EBM is still fine.

Disadvantage of EBM

I think there is one minor disadvantage of EBM. The used ‘=>’ sign looks nearly like the ‘=’ sign. If you mix up these two signs you may write source code with another behavior than expected. The following example shows such an issue.

static void Main(string[] args)
{
  MyClass x;

  for (int i = 0; i  new MyLargeClass();
}

class MyLargeClass
{
  // ...
}

Both implementations look nearly the same but have a different behavior. One is an initializer the other one is a getter. So, in one case the value has a getter only and in the other case a getter and setter. Furthermore, one getter will always return the same object instance and the other one will always create a new object instance. Within the shown example this may result in huge performance differences depending on the kind of the returned object. Of course, this is a rare issue and it will not result in runtime errors. So, this theoretical disadvantage should not stop you from using EBM.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

C# Array indexer vs. List indexer

The C# CLR contains a lot of nice collection classes. They are optimized for their individual use case. But from a common perspective they all have the same behavior we expect from a collection.

But there is one exception from this rule: the array class. The array was added to the CLR from the very beginning and you can think of this class as a built-in generic. The array class has many differences compared to the other CLR containers. One major difference is the indexer. The indexer of the array returns the element by reference and not by value like all other collections.

This difference may have a huge importance if you choose the right collection for your use case. Furthermore, if an array indexer is used without respect to the fact that it returns a reference, it can result in undefined behavior.

The following example shows the difference between an array indexer and a list indexer.

static void Main(string[] args)
{
  var array = new[] { new MyStruct(42) };
  var list = new List { new MyStruct(42) };

  // array[0].mValue = 15;
  // list[0].mValue = 15;  // error CS1612: Cannot modify the return value because it is not a variable

  array[0].SetValue(15);
  list[0].SetValue(15);

  Console.WriteLine("Array value: " + array[0].mValue);   // output: 'Array value: 15'
  Console.WriteLine("List value: " + list[0].mValue);     // output: 'List value: 42'
}

public struct MyStruct
{
  public MyStruct(int x) { mValue = x; }

  public void SetValue(int x) { mValue = x; }

  public int mValue;
}

The example shows two important differences. If we try to set a member value, we will become an error message in case of the list collection. This is because the list indexer returns a copy of the object. As we don’t create a variable to store this copy, we are not able to set the member value. If we instead use a member function to set the value, the function call will be successful in both cases. But it has a different behavior. The array value will be changed but the list value will not. That’s because the array indexer returns a reference to the array element. The member function is called for this element. The list indexer returns a copy of the element. The member function is called on this temporary object. So, the origin list member is not changed at all.

The different behavior of array indexer and list indexer isn’t an error in the C# CLR. On the contrary, it offers advantages because you can choose the right collection type according to your needs. So, you should keep this special behavior of the array indexer in mind and use it if you have according use cases.

Veröffentlicht unter .NET, C# | Kommentar hinterlassen

Object Instance Creation

Whenever you write an application in C++ you will create a lot of object instances. So, this is a base development task. C++ offers several ways to initialize variables. These are not just syntactical variations for the same task. The different initialization kinds may result in different behaviors. This rich variety of initialization kinds can result in some pitfalls, wrong expectations and programming errors.

Within this article I want to show the different object instance creation methods and explain their differences. To fully understand the examples of this article you should know the different types of constructors. You can get or refresh knowledge about constructors within this article.

Example Class

Within the examples of this article, a class “MyClass” is used. As we want to focus on the object instance creation, this class does not have any functionality. But it will provide several standard constructors and assignment operators. The ctor’s and operators just contain console outputs. This will help to see which ctor or operator is called. The following source code shows “MyClass”.

#include "stdafx.h"
#include 
#include 
#include 

class MyClass
{
public:
	MyClass();		// default ctor
	~MyClass();		// dtor

	MyClass(const int size);		// parameterized ctor

	MyClass(const MyClass& obj);		// copy ctor
	MyClass& operator=(const MyClass& obj);		// copy assignment operator

	MyClass(MyClass&& obj);		// move ctor
	MyClass& operator=(MyClass&& obj);		// move assignment operator

	MyClass(const std::initializer_list& list);	//initializer_list ctor
};

MyClass::MyClass() 
{
	std::cout << "default ctor" << std::endl;
}

MyClass::~MyClass()
{
}

MyClass::MyClass(const int size) 
{
	std::cout << "parameterized ctor" << std::endl;	
}

MyClass::MyClass(const MyClass& obj) 
{
	std::cout << "copy ctor" << std::endl;
}

MyClass& MyClass::operator=(const MyClass& obj)
{
	std::cout << "copy assignment operator" << std::endl;
	return *this;
}

MyClass::MyClass(MyClass&& obj)
{
	std::cout << "move ctor" << std::endl;
}

MyClass& MyClass::operator=(MyClass&& obj)
{
	std::cout << "move assignment operator" << std::endl;
	return *this;
}

MyClass::MyClass(const std::initializer_list& list) 
{
	std::cout << "initializer_list ctor" << std::endl;
}

Quiz

As a developer you already have implemented a huge number of object instantiations. Therefore, I want to start with a quiz. Following source code shows several ways to initialize MyClass. Please take a few minutes and think about these initializations. Try to answer following question for each line of code: Which ctor and/or assignment operator is called?

int main()
{
  MyClass test1;
  MyClass test2();
  MyClass test3{};
  <pre><code>MyClass test4(42);  
  MyClass test5{ 42 };

  MyClass test6(42.5);
  MyClass test7{ 42.5 };

  MyClass test8 = 42;
  MyClass test9 = 42.5;
  MyClass test10 = { 42 };
  MyClass test11 = { 42.5 };

  MyClass test12 = MyClass();
  MyClass test13 = MyClass(42);
  MyClass test14 = MyClass{ 42 };

  MyClass test15(test1);
  MyClass test16{ test1 };

  MyClass test17 = test1;
  MyClass test18 = { test1 };

  return 0;</code></pre>
}

MyClass test1

This is the simplest way to create an object instance. The default constructor will be called. Within the default constructor you should initialize all class members, otherwise they may contain garbage values.

MyClass test2()

Like before, this looks quite simple and we may expect that the default constructor is called. But not even close. This isn’t an object initialization at all. It is a function declaration. The function “test2” without parameters and a return value “MyClass” is declared. This C++ pitfall results in the redundant use of the parentheses. For backward compatibility the meaning of this code it still as in C++98 so it is still a function declaration. To bypass this pitfall, you should not use this syntax at all. Instead use the version seen above without parenthesis or use the braces syntax introduced with C++11 (as you can see in the next paragraph). But on the other hand, it is not a big issue because it will not result in errors. If you try to use the supposed object instance you will get according errors and if you not use the “test2” you get according compiler warnings too.

MyClass test3{}

This syntax was introduced with C++11. An object initialization with braces “{}” will call the default ctor. So, this syntax is equivalent to “MyClass test1”.

MyClass test4(42)

This will call the parameterized ctor and pass the “42” as parameter. The example class is a container type and therefore this object initialization will provide a container for 42 elements initialized with default value.

MyClass test5{ 42 }

If we use the braces syntax the values inside the braces will be converted to an initializer_list and therefore the initializer_list ctor is called. If we again think about the created container object we can say this time a container with one element was created and the element value was set to 42.

So “MyClass test4(42)” and “MyClass test5{ 42 }” will have different results but the syntax is nearly the same. This is a very important aspect and unfortunately a source for errors. Therefore, we should analyze this topic in more detail.

Furthermore, the braces syntax is still allowed even if we don’t have an initializer_list ctor. In this case the parameterized ctor is called according to the values given inside the braces.

In case a parameterized ctor and an initializer_list ctor exists the initializer_list ctor is prevered and will hide the parameterized ctor. This means if we get such a collision of two possible ctor’s the one with the initializer_list is prevered automatically and we don’t get any compiler warning. This may be a source for errors.

For example, we may use a container class “MyContainer”. This container class offers a parameterized ctor with two parameters: number of elements, initial value. We can create an object instance with “MyContainer x{10, 5}”. This will create a container with 10 elements all initialized with value 5. After a couple of time, the class MyContainer will be extended by the nice feature of an initializer_list ctor. But this new feature will change the behavior of the user code which uses the class. The existing initialization “MyContainer x{10, 5}” will now create a container with two elements of value 10 and 5. To fix this error we have to change the initialization and use parenthesis to call the hidden parameterized ctor: “MyContainer x(10,5)”.

This example shows the issues you may get with the initializer_list ctor. If you add this ctor to an existing class and if you have had parameterized ctor’s so far, they will get hidden and as a result you may break user code.

You will find an according example in the standard template library. The vector class offers an initializer_list ctor and it offers a hidden parameterized ctor with two parameters: number of elements, initial value.

MyClass test6(42.5)

The example class contains a parameterized ctor with an integer value as parameter. This parameterized ctor will be called even if the parameter does not match. An according value conversion is done. This narrowing conversion is allowed for some build in types but it may result in a loss of data and therefore an compiler warning will be shown.

MyClass test7{ 42.5 }

An object instance creation with braces will call the parameterized ctor too. But in contrast to the version above with parentheses syntax, narrowing conversions are not allowed. Therefore, in our example this object instantiation will result in an error.

MyClass test8 = 42

MyClass test9 = 42.5

MyClass test10 = { 42 }

MyClass test11 = { 42.5 }

These initializations are nearly the same as the ones explained above, with the syntactical difference that we use an assignment operator. But what’s the consequence of this different syntax? Will the assignment operator of MyClass be called?

The answer is simple: The use of the assignment operator is just a syntactical difference. These initializations are therefore equal to the ones explained above (see ’test4’ to ‘test7’).

Soo for test8 and test9 the parameterized ctor is called. Test10 will call the initializer_list ctor. Test11 will result in a compiler error as the narrowing conversion is not allowed.

MyClass test12 = MyClass()

MyClass test13 = MyClass(42)

MyClass test14 = MyClass{ 42 }

Now it becomes a little more difficult. What will happen in these cases? If we look at the different parts of the syntax, for example for the first case “MyClass test12 = MyClass()” we may think following: The “MyClass()” command creates an temporary object instance by calling the default ctor and “=“ will call the assignment operator and assign the temporary object to “test12” which was previously created due to the command “MyClass test12”. But this assumption is wrong. Unfortunately, I have heard it a few times, especially when people say you can optimize you code by eliminating the supposed temporary object and the call of several ctor’s and assignments.

So, what’s happening by using this kind of syntax? Nothing special! It has the same meaning as the syntax used for “test1”, “test4“ and “test5”. Therefore, for test12 the default ctor is called, for test13 the parameterized ctor is called and for test15 the initializer_list ctor is used. No temporary object is created and the assignment operator is never called.

In summary of the examples seen so far, we can say there is no difference between the following three initializations which will all call the parameterized ctor. Same is true for the default ctor and initializer_list ctor examples seen so far.

  • MyClass test4(42)
  • MyClass test8 = 42
  • MyClass test13 = MyClass(42)

These three spellings will create an instance of MyClass by calling the parameterized ctor. If you have read the article mentioned at the beginning you will answer back that there may be a theoretical difference. If we use explicit ctor’s the second syntax will no longer allowed. But that’s a restriction for explicit ctor’s only. In terms of common concepts, the three spellings will have the same result. But which one should be preferred? This depends on the coding guidelines of your company, your project team or your personal preferences. At the end of the article I will mention some coding guidelines.

MyClass test15(test1)

MyClass test16{ test1 }

These cases will call the copy ctor. As the given parameter is of type MyClass, the braces will not create an initializer_list.

MyClass test17 = test1

MyClass test18 = { test1 }

And again, the copy ctor will be called. As explained before, even if the syntax suggest that the assignment operator function is involved, it will never be called. So these initializations are nearly equal to the previous ones (test15 and test16) with the small difference that explicit ctor cannot be called.

Summary

As you can see there are many ways to initialize an object. These different initializations could have big differences in syntax but they have the same behavior. But unfortunately, there are some pitfalls to, like the initializer_list ctor which may hide a parameterized ctor. The braces syntax will offer uniform way to initialize objects. It should be used as preferred syntax as it can be used in nearly all cases. Following you will find some guidelines for object initializations but of course you may have your own coding guidelines or preferences.

Guidelines

Prefer object initialization with braces “{…}”, because it’s more consistent, more correct, can be used in nearly all cases and avoids old-style pitfalls at all.

In single-argument cases, especially on initialization of build in types, it is fine to omit the braces, for example “int i = 8;”.

In rare cases use parentheses “(…)” to explicitly call a ctor which is otherwise hidden by an initializer_list ctor.

When you design a class, avoid providing a ctor that ambiguously overloads with an initializer_list ctor. Users of your class should never need to use parentheses syntax to reach such a hidden ctor.

Veröffentlicht unter C++ | Kommentar hinterlassen

ctor types in C++

In C++ you will find several ways to initialize an object instance. For example, think about a class “MyClass” which can be constructed with a parameter. The object initialization can be done in several ways:

  • MyClass x{y};
  • MyClass x(y);
  • MyClass x = y;
  • MyClass x = {y};

But which one should be used? Do they all call the same constructor (ctor) or do these initializations lead to several results? Within the next two articles I want to think about these questions. The first article will show the several types of possible constructors and the second article will show the ways to initialize object instances.

Example object

For this article we want to create a simple class. An important task of a class is the resource management. So, the example class will contain a dynamically created memory resource which should be created by the different ctor types and released by the destructor (dtor).

Default ctor

Let’s start with the default ctor and the dtor. The default ctor does not have any parameters and it is used to initialize the class internal members.

<h1>include "stdafx.h"</h1>
<h1>include </h1>
class MyClass
{
public:
MyClass();      // default ctor
~MyClass();     // dtor

private:
int mSize;
int* mElements;
};

MyClass::MyClass()
: mSize(0)
, mElements(nullptr)
{
std::cout &lt;&lt; "default ctor" &lt;&lt; std::endl;
}

MyClass::~MyClass()
{
if (mElements)
{
delete[] mElements;
mElements = nullptr;
}
}

int main()
{
MyClass test1;      // default ctor
<pre><code>return 0;</code></pre>
}

Parameterized ctor

If we want to initialize the internal members with variable parameters, we can use a parameterized ctor. For example we can add a parameterized ctor to set the initial size of the data container.

<h1>include "stdafx.h"</h1>
<h1>include </h1>
class MyClass
{
public:
MyClass();      // default ctor
~MyClass();     // dtor
<pre><code>MyClass(const int size);        // parameterized ctor</code></pre>
private:
int mSize;
int* mElements;
};

MyClass::MyClass(const int size)
: mSize(size)
, mElements(mSize ? new int[mSize]() : nullptr)
{
std::cout &lt;&lt; "parameterized ctor" &lt;&lt; std::endl;
}

Copy ctor and copy assignment operator

Another often needed functionality is to create a copy of an existing object. This can be done by using a copy constructor. Furthermore, a copy assignment should be provided as a developer may use both ways to copy an object: copy it during creation or copy it by an assignment.

<h1>include "stdafx.h"</h1>
<h1>include </h1>
<h1>include </h1>
class MyClass
{
public:
MyClass();      // default ctor
~MyClass();     // dtor
<pre><code>MyClass(const int size);        // parameterized ctor

MyClass(const MyClass&amp; obj);        // copy ctor
MyClass&amp; operator=(const MyClass&amp; obj);     // copy assignment operator</code></pre>
private:
int mSize;
int* mElements;
};

MyClass::MyClass(const MyClass&amp; obj)
: mSize(obj.mSize)
, mElements(new int[obj.mSize])
{
std::cout &lt;&lt; "copy ctor" &lt;&lt; std::endl;
<pre><code>// create deep copy
std::copy(obj.mElements, obj.mElements + mSize, stdext::make_checked_array_iterator(mElements, mSize));</code></pre>
}

MyClass&amp; MyClass::operator=(const MyClass&amp; obj)
{
std::cout &lt;&lt; "copy assignment operator" &lt;&lt; std::endl;
<pre><code>// Self-assignment detection
if (this == &amp;obj)
{
    return *this;
}

// release resources
if (mElements)
{
    delete[] mElements;
    mElements = nullptr;
}

// create deep copy
mSize = obj.mSize;
mElements = mSize ? new int[mSize]() : nullptr;

std::copy(obj.mElements, obj.mElements + mSize, stdext::make_checked_array_iterator(mElements, mSize));

return *this;</code></pre>
}

As you can see, things become more difficult now. If we copy an object we must pay attention to several thins. At first, we should decide whether we want to create a deep or a flat copy. At next we must think about the resource management. This can be seen in the implementation of the copy assignment operator. It contains a self-assignment detection and prior to the resource creation we should releases the existing resources.

Move ctor and move assignment operator

The move ctor and operator should create a copy too. But in contrast to the copy operator we get an r-value as parameter and know that the source object is a temporary object only and will no longer be used. This will allow a more efficient resource management. As the source object in no longer used we can steal its resources instead of creating new ones.

<h1>include "stdafx.h"</h1>
<h1>include </h1>
<h1>include </h1>
class MyClass
{
public:
MyClass();      // default ctor
~MyClass();     // dtor
<pre><code>MyClass(const int size);        // parameterized ctor

MyClass(const MyClass&amp; obj);        // copy ctor
MyClass&amp; operator=(const MyClass&amp; obj);     // copy assignment operator

MyClass(MyClass&amp;&amp; obj);     // move ctor
MyClass&amp; operator=(MyClass&amp;&amp; obj);      // move assignment operator</code></pre>
private:
int mSize;
int* mElements;
};

MyClass::MyClass(MyClass&amp;&amp; obj)
{
std::cout &lt;&lt; "move ctor" &lt;&lt; std::endl;
<pre><code>// steal content of other object
mSize = obj.mSize;
mElements = obj.mElements;

// release content of other object
obj.mSize = 0;
obj.mElements = nullptr;</code></pre>
}

MyClass&amp; MyClass::operator=(MyClass&amp;&amp; obj)
{
std::cout &lt;&lt; "move assignment operator" &lt;&lt; std::endl;
<pre><code>// Self-assignment detection
if (this == &amp;obj)
{
    return *this;
}

// release resources
if (mElements)
{
    delete[] mElements;
    mElements = nullptr;
}

// steal content of other object
mSize = obj.mSize;
mElements = obj.mElements;

// release content of other object
obj.mSize = 0;
obj.mElements = nullptr;

return *this;</code></pre>
}

Again, we will add a self-assignment detection and release old resources. Furthermore, we have to reset the resources of the source object after we stole them. This will prevent the source object dtor to release these resources.

Copy & swap idiom

The copy ctor and the copy assignment operator as well as the move ctor and the move assignment operator contain some duplicate source code. There exists a common implementation technique which addresses this issue: the copy & swap idiom. This implementation technique comes with the advantage to remove this duplicate code but it will have some disadvantages too. Within this article I don’t want to explain the copy & swap idiom because it’s an own complex topic but you should keep in mind that this idiom is existing.

Initializer-list

Another important ctor type, especially for container like object, is a ctor with an initializer list. This will allow to pass an array of objects which is used to initialize the container class.

<h1>include "stdafx.h"</h1>
<h1>include </h1>
<h1>include </h1>
<h1>include </h1>
class MyClass
{
public:
MyClass();      // default ctor
~MyClass();     // dtor
<pre><code>MyClass(const int size);        // parameterized ctor

MyClass(const MyClass&amp; obj);        // copy ctor
MyClass&amp; operator=(const MyClass&amp; obj);     // copy assignment operator

MyClass(MyClass&amp;&amp; obj);     // move ctor
MyClass&amp; operator=(MyClass&amp;&amp; obj);      // move assignment operator

MyClass(const std::initializer_list&lt;int&gt;&amp; list);    //initializer_list ctor</code></pre>
private:
int mSize;
int* mElements;
};

MyClass::MyClass(const std::initializer_list&amp; list)
: mSize(list.size())
, mElements(mSize ? new int[mSize]() : nullptr)
{
std::cout &lt;&lt; "initializer_list ctor" &lt;&lt; std::endl;
<pre><code>if (list.size())
{   
    std::copy(list.begin(), list.end(), stdext::make_checked_array_iterator(mElements, mSize));
}</code></pre>
}

Use the different ctor types

The following source code contains an example console application which will create object instances. Depending on the given parameters the according ctor type is called.

int main()
{
MyClass test1;      // default ctor
<pre><code>MyClass test2(7);       // parameterized ctor

MyClass test3(test1);       // copy ctor
test3 = test1;                  // copy assignment operator

MyClass test4(std::move(test1));    // move ctor
test4 = std::move(test2);                   // move assignment operator

MyClass test5(7.1);     // parameterized ctor, warning: conversion from double to int

MyClass test6{ 1,2,3,4,5 };         //initializer_list ctor

return 0;</code></pre>
}

Implicit conversion ctor vs. explicit ctor

So far, we have implemented a couple of ctor’s for MyClass. With that ctor’s in mind do you think the following line of code will construct an object instance or will it show an error? If an object instance is created, which type of ctor is used?

MyClass test = 7;

This line of code will create a MyClass instance by calling the parameterized ctor. The parameterized ctor is also called a “conversion” ctor because it will allow implicit type conversion. In our case the MyClass instance is created based on an int value, so you can say the int value will implicit converted to an MyClass by calling the according parameterized ctor.

But maybe you don’t want to support such an implicit conversion. That may have several reasons. In my opinion such an implicit conversion looks a little bit strange and there are a lot of developers which don’t know the technical background about this line of code. Most will know that an object is created but some don’t know which ctor is used or whether it is a combination of ctor and assignment. This uncertainty and side effects on code changes may results in errors too. Therefore, you may prevent implicit conversion for some kinds of classes. In this case you have the possibility to declare the parameterized ctor as explicit. If you do so, the ctor cannot be used as implicit conversion ctor. The following source code shows an according example.

<h1>include "stdafx.h"</h1>
<h1>include </h1>
<h1>include </h1>
class MyClass1
{
public:
MyClass1() { std::cout &lt;&lt; "default ctor" &lt;&lt; std::endl; };
MyClass1(const int size) { std::cout &lt;&lt; "parameterized ctor" &lt;&lt; std::endl; };
MyClass1(const MyClass1&amp; obj) { std::cout &lt;&lt; "copy ctor" &lt;&lt; std::endl; };
};

class MyClass2
{
public:
explicit MyClass2() { std::cout &lt;&lt; "default ctor" &lt;&lt; std::endl; };
explicit MyClass2(const int size) { std::cout &lt;&lt; "parameterized ctor" &lt;&lt; std::endl; };
explicit MyClass2(const MyClass1&amp; obj) { std::cout &lt;&lt; "copy ctor" &lt;&lt; std::endl; };
};

int main()
{
MyClass1 test1 = 7;         // OK; calls parameterized ctor
MyClass1 test2 = 7.1;       // OK; calls parameterized ctor, warning: conversion from double to int
<pre><code>// MyClass2 test3 = 7;          // ERROR; implicit conversion from int to MyClass2 is not allowed
// MyClass2 test4 = 7.1;        // ERROR; implicit conversion from double to MyClass2 is not allowed
MyClass2 test6 = MyClass2(7);       // OK; calls parameterized ctor
MyClass2 test7 = MyClass2(7.1); // OK; calls parameterized ctor, warning: conversion from double to int

return 0;</code></pre>
}

MyClass2 offers an explicit ctor only. Therefore, the object instantiation cannot be done by using implicit conversion. Instead the parameterized ctor must be used explicitly. This may result in cleaner source code and may prevent errors.

Summary and outlook

Within this article we have seen the different types of constructors and got an introduction how to use them, for example to manage the resources needed by the object instance. Within the next article we will see the different ways to create an object and see which ctor is used in which situation.

Veröffentlicht unter C++ | Kommentar hinterlassen